Like children, artificial intelligence needs proper parenting to achieve its full potential, and proper parenting starts with a healthy diet — of good data.
Businesses increasingly acknowledge the potential of A.I. to accelerate decision making, but many have serious concerns about what is happening inside the black box. The quality of any A.I. can only be as good as the data it processes. Of course, “garbage in, garbage out” has long been an analytics refrain, but it’s even more important for A.I.
Why? Consider the difference between the two. An analytics solution typically provides a graph prioritizing the results. Ask an analytics program why sales are down in the Northeast region, and you’ll essentially get a list of possible factors: supply chain hiccups, demographic changes, social media trends, etc. A human then has to evaluate the results to determine which factors to base the ultimate decision on. A cognitive A.I. approach is less transparent. Ask an A.I. why sales are down in the Northeast region and you get a single, definitive answer. That’s it. Done deal.
The A.I. approach would be a business user’s dream come true. Ask a question, get a definitive answer, and confidently take an action. It would save time and result in faster, better business decisions.
But what if the A.I. is wrong? More important, how would a business user ever know an A.I. is wrong? Because of this, relying on A.I. requires a level of trust significantly higher than for an analytics solution. From the perspective of a chief data officer or a data scientist, parenting an A.I. is a humbling responsibility.
Those in charge of feeding A.I. must ensure a complete and healthy diet: clean, relevant and reliable data with traceable provenance. Instead of the five food groups, a healthy A.I. diet depends on curating the data that goes into it:
A.I. shouldn’t be allowed to drink wildly from a data lake where data has not been cleansed, packaged and structured for easy consumption.According to the Compliance, Governance and Oversight Counsel (CGOC), nearly 70% of the data that companies produce and collect has no business, legal or compliance value, so you must develop a way to understand and specify the scope and criteria of the data to be fed to A.I. Which data stores and what file types? What connections exist between the data? Who is responsible for making the determination and for final approval?
Reviewing and managing sources
Once you have specified your sources, you need to ensure the quality of the data. To increase confidence in and defend responses from A.I., you must be able to assess the authenticity (via audit trails), accuracy and value of the content contributed to your data collection. This can be done through heat maps and visualizations.
As data is located and copied from multiple sources, data scientists and subject matter experts must have access to a simplified process for locating, reviewing and tracking higher-quality information that will be used to train the A.I. “brain.”
Tagging and classifying
You need to tag and classify the data to ensure that it can be properly digested. Depending on the A.I. task, some metadata has more value than others. If you are looking for marketing insights, you will likely value metadata drawn from EXIF files associated with images on social media sites, including geolocation, timestamps, camera type and serial numbers. In medical settings, metadata elements including patient ID-date of birth, provenance-timestamp, and privacy-content are essential.
Track responses and update
Finally, you must have governance capabilities built into the system to track responses to the information used and adjust the diet accordingly.
The great irony of A.I. is that while it would seem to be the ultimate autonomous computing creation, and technology exists to automate parts of the data curation process, the job of parenting an A.I. is a particularly human one based on extensive knowledge and expertise in the subject matter area of the A.I. And only by recognizing the importance of the human element within the process of data curation can we fully assess the difficulty of getting A.I. right and avoiding hyped expectations and overconfidence in the implementation effort. As a result, as parents, we must continually and patiently nurture A.I. as it matures through specific stages, mastering new and very specific capabilities that meet well-defined requirements.
Heidi Maher is the executive director of the Compliance, Governance and Oversight Counsel (CGOC).