Get the dirt on your data

Stampin' Up Inc. is a direct sales company in Riverton, Utah, that manufactures and distributes rubber stamp supplies. In the summer of 2004, at the end of its highest sales season, Stampin' Up's manufacturing resource planning system stopped sending the company orders to make more stamps because it was showing that the items were already in stock. But it was inventory that didn't exist. The culprit: dirty data.

"It took a few days to figure it out," says Steve Gockley, manager of Web infrastructure, Web sites, and business intelligence and analytics. "Once it was found, we had to cut work orders and do some emergency manufacturing."

Lucky for Stampin' Up, the backlogged products weren't available from competitors. Otherwise, customers might have gone elsewhere. But employee morale suffered. "It takes a while to build confidence back up," says Gockley.

Dirty data is data that is incorrect, missing or misplaced. And it's everywhere. In a 2006 poll of 1,160 knowledge workers by Harris Interactive Inc., 75 percent of the respondents reported having made critical business decisions based on faulty data. "In any company of any size, dirty data is a factor," says Gockley.

That's because data is dynamic by nature. Manually entering data, integrating systems or repurposing data, or something as simple as a customer moving, dying or marrying, can mess things up. The trick is to find errors and fix them.

"A lot of determining whether data is dirty is about looking at trends," says Susan Nonken, financial reporting systems manager at ABB Inc., a provider of power and automation products in Norwalk, Conn. If you're looking at data that seems too good, too bad or just too strange to be true, it probably is.

Also look for errors in data drawn from multiple sources and formats. Move Inc. in Westlake Village, Calif., aggregates listings from real estate agents and brokers. "We have to get the data whatever way we can," says Bill Weir, director of business systems. The result: a high possibility of errors, he says.

Once you know you've got faulty data, you've got to fix it, and the hurdles may be more than technical. People may be reluctant to relinquish control of their data, even to render it more useful. And customers can get impatient. "They want to know why you can't fix it right away," says Bita Mathews, data warehousing manager at Move.

Getting executive buy-in can help. So can educating users about the processes and complexities of data warehousing. "We have learned that the frequency of a message is as important as the message itself," Mathews explains.

Another challenge, says Weir, is keeping data clean once you scrub it. That requires some careful decisions about data-quality governance, enforcement and maintenance. The commitment isn't just to the cleanup, says Weir, but to "how data quality will be enforced, and what the integration and architectural guidelines should be for data quality standards."

Be prepared for an ongoing process, says Mathews. For example, because Move often finds multiple listings of the same property, she has had to make de-duping (or deleting multiple records) part of her routine.

"It's not a one-step process," says Robert Lerner, an analyst at market research firm Heavy Reading in New York.

Join the newsletter!

Error: Please check your email address.

More about Harris InteractiveHISTandemWall Street

Show Comments