Get Data in Good Shape for AI Success

Feature
get data in good shapeISTOCK.COM/JAY YUNO

No matter where or how you want to deploy AI in your company, there’s one element that is key to successful implementation: the quality of the data that you input into your AI models.

“The path from AI’s potential to its inevitability for a company is paved with data. That’s an underlying capability that you’ve got to develop,” said Saurabh Vijayvergia, retail AI strategy and engineering leader at Deloitte Consulting LLP. But companies should overcome three common data barriers when attempting to implement AI and/or generative AI.

The first is availability of data; an organization may lack some of the data that they need from internal or external sources to get the kind of outcomes they are hoping to derive from their AI applications.

The second barrier is data accessibility. Even if the data is available within an organization, it may not be accessible in a consumable format, Vijayvergia said. Data is often siloed as the result of legacy systems and/or acquisitions of companies that have different data platforms or tech stacks. Sometimes there are organizational barriers that lead to data accessibility challenges.

Quality and reliability of data is the third common barrier. “You’ve heard about garbage in, garbage out? With AI, this could become garbage in, garbage squared, if not quadrupled, because AI basically accelerates the outcomes exponentially,” said Vijayvergia.

Poor quality data impacts the effectiveness and the accuracy of AI‑driven insights and predictions. “When your data is inconsistent, incomplete or otherwise inaccurate, it leads to flawed outputs,” said Tim Long, global head of manufacturing at Snowflake, a cloud‑based AI data platform developer. “This not only reduces the accuracy of predictions but ultimately can lead to making the wrong business decisions based on these faulty insights. Ultimately, poor data quality diminishes the return on investment from AI initiatives and makes it challenging to scale solutions across different factory locations due to inconsistencies in data quality and structure.”

One example of the damaging impact that poor data can have comes from a global retail and consumer products company that built an in‑house AI application. Its hope was to forecast future product demand and align manufacturing production schedules, but it didn’t work out that way. “The demand planning was off by several hundred basis points, which resulted in production mismatches, excess inventory, increased costs and, from a sustainability perspective, a lot of waste,” said Vijayvergia. “It also caused disruptions in the company’s supply chain and affected supply relationships and production schedules.”

Called in for assistance, Deloitte determined the underlying issue was data quality, especially inconsistencies in and non‑standardization of data from sources like sales, marketing and external markets. In addition, the historical data used to train the AI model was biased towards certain regions and did not represent the global market accurately.

Companies that ensure they’re using good, clean data have greater success. “Snowflake has many customers that have successfully leveraged AI, particularly generative AI, to improve decision‑making across their organizations,” said Long. “After eliminating their data silos and unifying their data foundation on Snowflake, one of our CPG customers recently rolled out a chatbot experience for their supply chain team. This functionality enables them to analyze data trends and spot variations in real‑time through simple, natural language queries, without the need to dig through multiple dashboards. This streamlined approach helps them quickly address issues, optimize supply chain operations and ultimately ensure that their products are always available to consumers at the right time and price, all thanks to their unified data foundation.”

Companies that don’t look at the readiness of their data for AI applications may not proceed far in its adoption. While they may be able to create a proof of concept or a pilot, they may not be able to scale those applications without having data in a consumable place, said Vijayvergia. “If the data is not there, your AI applications are not going to be able to tell you much about your organization.”

Developing Data Governance

Before introducing data into AI models, companies should evaluate their data in three key areas: consistency, completeness and accuracy, Long said. Data should be standardized and complete with no missing information that could distort AI predictions or models. In supply chain management, for example, incomplete inventory or sales data can lead to inaccurate forecasts.

“When working with AI applications like vision‑based models, it’s critical to ensure that your data is properly labeled and classified to provide ‘ground truth’ for training models,” Long added. “AI models need to understand the context of your organization’s language to be effective, particularly in manufacturing and supply chain scenarios where precise terminology matters. Consistently reviewing and refining your data as it grows is also key to maintaining its quality over time.” Data from all sources should be integrated and aligned before it’s used in an AI model.

Implementing data governance policies is core to cleansing data and getting it to the point where it can be used to provide meaningful results with AI. According to Google Cloud, “Data governance is everything you do to ensure data is secure, private, accurate, available and usable. It includes the actions people must take, the processes they must follow and the technology that supports them throughout the data lifecycle.”

Most companies regard data governance as a time sink and a money pit, because it doesn’t directly improve the bottom line, said Randy Bradley, associate professor at the University of Tennessee and academic research fellow, MIT Center for Information Systems Research. The need for data governance becomes clear, however, when companies start some initiative—like implementing AI—and begin to realize that they are having challenges due to their poor data quality.

One problem is that different users and/or generators of data often have diverse ways of looking at data. “The same data elements or data assets can have a multitude of meanings and sometimes even have different sets of understandings about what they are and different requirements for how they should be represented,” said Bradley. It is common for a company and its data users to have different names for the same data, which can create data confusion that leads to misunderstandings or misinterpretations of information generated by their IT.

Target, for example, failed in its attempts to expand into Canada in part because problems with data flow, data integrity and data accuracy in their inventory system led them to believe they had inventory in the stores that they did not actually have available. The shelves were often bare, and shoppers were unhappy.

Establishing an effective data governance process is critical in realizing value from AI initiatives. Bradley said the first step is identifying and engaging the varied stakeholders, including the people who are creating data assets, those who are using data assets and those who own data assets.

“You also must have data stewards who are responsible for providing oversight or overwatch with respect to the data. They’re the ones who protect the integrity of what it means and ensure that people who should have access get access, and those who shouldn’t have access don’t get access, and that no one changes the meaning or context of a data element or asset without going to the data owner,” he said.

Companies need a data governance committee that is responsible for establishing a data governance process and policies. Its members should be people who are high enough in the organization to make decisions, who can say, “Here’s what this specific data element means or here’s what acceptable use of this data asset looks like,” Bradley said.

When there’s conflicting understanding and use of data assets and elements, the data governance committee will decide what is allowable under the company’s data policy. They might establish and rely on a data dictionary, which can contain data crosswalk so that data users can recognize varied terms for a certain data asset.

Click here to read the full feature.