Bad data = bad learning = bad AI?

What cultural tenants should we/should we not feed AI to reduce biases?

Jul 01, 2018

Over the past two weeks, I've participated in two events discussing artificial intelligence and the implications of "bad data". Bad data can either be data that is unclean and unable to be used effectively or biased data that can cause algorithms to make morally wrong decisions. Think of biased data as data used during the learning process for AI systems that perpetuate things like gender and ethnic stereotypes. These types of "bad" data are very big considerations that from a cultural perspective, we practitioners must consciously face into to create a brighter future.

Two weeks ago, I joined other Detroit-based data analytics practitioners at a lunch hosted by Infobuilders at which we discussed Smarter Data Quality. To be fair, the Infobuilders team wanted to talk about the issue of "bad data" in light of educating us on their products and services for master data management (which I am not a consumer of nor am I very knowledgeable about). The beginning of the presentation, though, laid out several implications for "bad data" and started an interesting conversation about what it is. There are master data elements in a company (e.g. customer information; employee information; and supplier information) that are necessary to be curated using any number of tools. This is critical in order to create a stable platform that lends context to future analysis without muddying the waters with mistyped addresses and duplicate contacts.

However, an organization can't master and curate all of the data that's ingested or created in a day. We are increasingly looking at unstructured, text-based data that "is what it is". We need to employ machine learning and other data science algorithms to see through the poor data quality and understand the patterns and relationships that are emerging. My question is: is there such thing as bad data or are we just exposed to different challenges that we will need to rely on AI to help us with in the future?

Last week, I joined a different group of data analytics executives and consultants from various industries at the Institute of Innovation in Large Organizations's full-day discussion on Enterprise Innovation Scaling. During the discussion, we talked a lot about scaling algorithms and the use of data to train and retrain them. At one point, one of the other participants noted that one area to look at to get support for funding for these kinds of projects today is turnover and staffing. It occurred to me, then, that not only could we develop (or "on-board") an algorithm one time, but it would continuously get better and multiple people could give it "feedback" instead of one manager. In the human world, it can take a very long time to find good, reliable talent. When we find them, we start from scratch (or maybe just above scratch) to train them. Along the way, they won't do the job just right and they'll get feedback. They'll adjust and grow, but almost certainly one day they will leave the position and again, the organization will have to start over. With AI, we can cut that cycle and use the hundreds of thousands of dollars in the entire process (not to mention the person's salary) to really tackle developing an algorithm to do certain knowledge work.

But what about biases from "bad data"? When you think about it, how are they different from biases any human we hire has in their head? It's critical for us to make sure we have corporate cultures of equality, diversity, and meritocracy; that the organization walks-the-walk in this respect; and that it makes sure these cultural tenants are baked into the data used to train both employees and algorithms. There are countless corporate values, priorities, and traits that taught to employees via training classes; the same considerations should be made by analytics organizations when data is selected and used to feed systems.

If AI, like humans, then will only be as good as the data given, what elements do you think should be included or hidden from the algorithms in order to make them better than humans at being fair, just, and unbiased in making decisions?

Matt Brooks is a seasoned thought leader and practitioner in data and analytics; culture; product development; and transformation.

Thanks for reading dAIta POINTS! This post is public so feel free to share it.

Discussion about this post

Ready for more?