Piecing together the data picture

Poor data quality can confuse your customers, undermine your applications or even put you out of business -- and there's everything in the world you can do about it. More than simple data-cleansing, which involves correcting a misspelled name or changing "Avenue" to "Street," a data quality initiative addresses more complex and subtle problems.

For example, one New York bank that had a 3 percent to 5 percent bad-debt ratio on its credit card operation acquired another bank, says Aaron Zornes, an analyst at Meta Group. "It turns out that the acquired bank had a 15 percent bad-debt ratio. The New York bank took over, and the bad debt nearly put them out of business," he says.

If the acquiring bank had had a data quality initiative to run large database-comparison jobs off-line, the problem could have been averted, says Zornes. Bank managers could have predicted the loan default rate by comparing the outstanding debt, incomes and even partial ZIP codes of the acquired bank's credit card customers against a historical database of similar customer profiles.

"They would have been able to tell that this company wasn't a good buy," Zornes says. "Enterprises cannot afford to wait on data quality efforts."

Data quality initiatives are critical to enterprise applications such as CRM and ERP systems, Zornes notes. And according to The Data Warehousing Institute in Seattle, data quality problems cost U.S. businesses more than US$600 billion per year.

"The basis of any CRM system is the integrity of the data," says Steve Deeb, vice president for CRM at Monster Worldwide. "Any and all processes are driven by that data."

In addition to business needs, there are now regulatory pressures to maintain better data, Zornes says. "If someone has bought a large amount of ammonia-based fertilizer, then rents a car," the U.S. Department of Homeland Security wants to know about it, he says. "And this isn't information you can wait months or even a week to find out."

The tools to to improve data quality exist, says Zornes, but although "businesses give lip service to the need for data quality, too often they don't do anything about it."

James Eardley, a managing director of CRM at FleetBoston Financial Corp., agrees. "Data quality gets short shrift too often. It's not important until you need it," he says.

Although in dissimilar industries, FleetBoston and Monster both use CRM software from Siebel Systems, and faced similar data quality problems. Duplicate records in customer and contact databases meant one department didn't know what another was doing.

"What we were missing was a total picture of the customer relationship. We have multiple business sales forces following a single customer. It's hard enough to get one business unit's data clean. We now have 24," Eardley says.

"There's no consistency with how users enter customer and contact records," he continues. "Some people use upper- and lowercase; others use all uppercase." Today FleetBoston's system standardizes the data elements and does ZIP code lookups.

The company opted for data quality software from FirstLogic. Those tools, coupled with the Siebel software, "seemed to do exactly what we needed," Eardley says.

To prevent duplicate entries, when a user enters a record, the FirstLogic system generates a token, which it compares to others to see if the database has similar tokens. If it finds any, it shows them to the user to determine whether the record is a duplicate.

"We had to work a little bit to get the tokens to our liking, and then it worked fine," Eardley says. "We also run batch jobs monthly to identify and fix any duplicates." Any records that the system can't resolve go to the business side for review.

Monster Problem

Similar data inconsistencies undermined confidence in Monster's system, says Deeb. Duplicates and unidentified accounts in the Siebel system made it difficult to know which database to use for ordering or invoicing, he says. And the sales staff wasn't getting the support it needed.

Initially, Deeb says, "we didn't see a product that mapped directly into what we were doing." But after building its own address-matching application, the company found that it needed a more strategic tool and more sophisticated analysis than its in-house application could offer.

About a year and a half ago, Monster took another look at the field and chose the Trillium Siebel connector from Trillium Software.

"When we were looking at the ROI, the ease with which the Trillium product could be integrated into our systems was attractive," Deeb says. "We leveraged the strength of the Trillium core product -- such as the way name and address databases from around the world can be plugged in -- and integrated it into our processes in a way that made sense to the way we do business."

Now, when a record is entered, the system evaluates in real time whether it's new or a modification of an existing record. The company also runs data quality checks in batches to ensure that duplicates aren't introduced when it incorporates a new mailing list into its existing database. They're also performed at regular intervals to minimize data degradation. In addition to the IT resources dedicated to maintaining data quality, business staffers are also assigned to monitor the system and resolve anomalies.

It's the essence of analytical CRM, Deeb says. "Real-time analysis to determine the right offer to the right customer at the right time in a predictable manner is driven by the quality of customer data supporting that analysis," he says.

But most companies believe that their data is cleaner and more accurate than it is, says Wayne Eckerson, The Data Warehousing Institute's education and research director. He cites as one example an insurance company that each month gets 2 million claims, each with 377 data elements. At an error rate of 0.1 percent for all claims data, that's more than 754,000 errors monthly, which amounts to 9.04 million errors annually. If 10 percent of data elements are critical to its business decisions, the company each year must correct more than 1 million errors that could damage its ability to conduct business. Estimating the risk cost at $10 per error, poor data quality costs the company $10 million annually in erroneous payouts.

"It's bewildering," says Eckerson, "but almost half of all companies have no plan for managing data quality." Responsibility for data quality often rests with IT staffers, who make their decisions based on the tools available.

Data Quality Means Business

"First and foremost, data quality is a business issue," says Ted Friedman, an analyst at Gartner. "But the solution is the proverbial three-legged stool: people, process and technology."

The first step in a data quality initiative is to analyze what the data is and how it's used, Friedman says.

GMAC Mortgage followed this measured course in its data quality initiative. When interest rates went into free-fall a year and a half ago, the first thing the company's CEO wanted employees to do "was cope with a 300 percent to 400 percent increase in daily business of people refinancing mortgages," says David Adams, GMAC's enterprise data access manager.

Tuning the Oracle database that supported application processing improved performance, he says, "but it also opened our eyes to the need to go further and address the quality of the data itself." And with GMAC beginning a major overhaul of its data warehouse -- "actually, it was more a large tank of data than a data warehouse," says Adams -- the timing was right to launch a data quality initiative.

"To compete on the other side of the refinancing boom, we were going to have to have better, cleaner data to get the accurate analyses that the CEO wanted and that we needed to make the most of our operation," he says.

Adams brought in a data quality consultant to explain to the executive council what the project would entail. Adams and his team researched the data quality tools, ran two pilots and then selected software from Ascential Software. The Ascential product was more expensive and took more work to get going than some less sophisticated tools, he says. But Adams was sold on the software's heuristic logic, which let it adapt to GMAC's operation.

"The ETL (extract, transform and load) technology is pretty mature, and it works well," says Adams. "But it's the data quality and metadata stuff that's going to give you the great advances."

Physically merging databases would have required that every division agree on a single definition for each data element, which was "probably impossible," Adams says.

Instead, metadata resides in Ascential DataStage and links divisional databases at the logical level, with "pointers" indicating the source of the data. Each division's database remains inviolate.

Each division can decide what data can be shared and with whom, which is important for adhering to government regulations. Other tools couldn't deliver that granularity of control, says Adams.

The team installed the software in January and, working with the data warehousing team, went live in May with a relatively small application for new credit policy reporting. The first large data mart, to support all reporting for GMAC's wholesale operations, will go live Aug. 15.

"Information is a critical asset," says Meta's Zornes. "We need to change the way we think about it. It may sound like science fiction now, but in the future, companies will certify information the way we certify works of art and financial instruments, i.e., by assigning that information asset's value and origination."

Join the newsletter!

Error: Please check your email address.

More about Ascential SoftwareFleetBostonGartnerLogicalMeta GroupOracleSiebel Systems

Show Comments

Market Place