We have spent a lot of time thinking, writing, and learning about how AI is inherently human. From our vantage point, humans are the ultimate users of cognitive solutions (for now at least), and success stories are almost always those that recognize the work involved to launch these technologies is best achieved when leveraging a human-centric lens, and not solely a technology one. An aspect we haven’t spoken of, however, is how the data that is collected, described, connected, and made accessible is an equally critical lens—and an equally human one.
In order to build anything “cognitive,” meaning any algorithms that help humans make decisions, you need various amounts of data. Data is AI food. Data is also telling the story of something that is happening in the real world, at a moment in time, oftentimes representing a human story. Cognitive solutions are built by humans setting rules and guidelines for the machine to obey, and where there are humans there is bias.
According to the Better Humans Group, we have 175 unconscious biases. The reality is, as long as there are humans involved, there will be bias. Therefore, you can only minimize, bias. You cannot remove it. (We’re getting to our island story soon—bear with us).
In order to minimize data bias, an organizational mind shift has to take place - moving from siloed, out of context data and numeric centricity to human focused storytelling --thinking of data as a system.
To think of data as a system consider these three primary components, and recognize the opportunity for bias in each of them:
- Human context
- Data artifacts
- Interpretation and storytelling
The human context refers to all the qualitative and quantitative data about the moment in time that a snapshot of a system was taken. Humans are connected to each other, our spaces, our time, and our world - therefore every measurement cannot capture all of our diverse relationships. Every data point is collected for a reason-there is always a human to trace the data back to, even if the data is measuring machine interactions. Data is either a product of a human narrative or it is information for a human decisions-sometimes both.
One of our authors, Brian the Data Science Guy, lives on an island—literally. Recently the county he lives in, which includes the island and a large piece of mainland, eliminated the bus route he takes every day, based on data that showed less than one person (!) used the bus per weekday. In this case, the human context, or data that needs to be collected, answers the question, “how many people are taking the bus every day?”
However, human bias derailed adequate collection of this data, because the humans involved assumed only one way of counting riders, which was to count the number of people at physical bus stops. Seems logical, however, the island has no physical bus stops. They use a card scanning system that records where a bus rider boards the bus. Yet, the algorithm that recorded ridership mapped boarding locations to the closest bus stops.
The data collected was biased towards a typical, physical bus stop, non-cash experience, but completely excluded alternate experiences. Because of the bias in the human context, the key rules for data collection yielded artifacts that misrepresented reality, which led to a decision to discontinue the bus route on the island. However, any human riding a bus on the island would see clearly that the ridership on a weekday was significantly more than less than one human.
So, how does one minimize the bias in this scenario?
First would be understanding the multi-dimensional experiences of the riders—simply asking the question up front about whether all the county uses a physical bus stop. The team would have then known to process the collected data points differently regarding the stops that users engage with. Similarly, had anyone stopped to ask about all possible payments, the data would have been more complete. Human insights and experience work are critical to minimizing bias in data collection.
Secondly, having a group (one or more people) unconnected with the original data collection and processing apply a diverse set of eyes and thinking on a data set generally will bring bias to light. Applying curiosity and asking questions such as the below can surface issues:
- What is the data to be used for?
- Who created the data?
- How does it connect?
- What does the data represent?
- Are all the stakeholders represented?
- What are we missing?
So, what happened to those island riders? After much friendly protest from riders on the island, human counters were added to the bus routes to collect data that was left out of the original approach. The output, still in progress, will be a more complete picture of the human story that is the transit ridership experience across the whole county. The impact will be that decision makers have the full story from which to make choices that best serve the entirety of the bus riding community. This is good news for us--we need our Data Scientist.