Open source data integration vendor Talend has unveiled a tool aimed at scrubbing dirty data from corporate information repositories.
Talend Data Quality, which will be available free under a GPL license, ferrets out such errors as duplicate names and address, and improperly configured data including phone numbers. At its most basic, the software can ensure a person's phone number is correct and has the required number of digits; check that zip codes match the cities contained in an address entry; and consolidate entries that have names, nicknames or abbreviations that apply to the same person.
"Data quality goes way beyond name and address issues," says Yves De Montcheuil, vice president of worldwide marketing for Talend. "Those are the most prevalent. But if you have a product catalog, you need to ensure product descriptions are correct and that the price makes sense."
He says clean data is key when integrating information across systems because mis-information can propagate fast not only internally but to partners.
Talend is coupling its Data Quality tool with the Talend Open Profiler software it released in June. The Profiler can look inside a database and pinpoint problems. The tools can be used together or separately. In addition, both tools work in harmony with Talend's Integration Suite.
The company also has been forging relationships with major vendors, including a partnership with Microsoft earlier this year.
Data Quality is a graphical tool that lets users drag and drop components onto a process map. The components describe such tasks as reformatting an address, checking an address against the U.S. Postal Service database, pulling data from specific repositories, adding longitude and latitude to customer records to provide navigation help to delivery drivers, or pulling data from a credit bureau. An SDK lets users create their own components.
Once the process is complete, Talend Data Quality generates an executable code in Java or Perl that can be installed in multiple places on the network and close to data sources.
Talend plans to deliver Data Quality at the end of September and will offer tech support and other services via a subscription that starts at US$15,000 per year.