Computerworld
Talend releases open-source data-profiling application
The Open Profiler takes on a crowded field of proprietary applications and helps analyze and clean up critical data
Todd R. Weiss  24 June, 2008 08:34

French open-source data integration vendor Talend Monday unveiled its data-profiling application, which will allow companies to assess their data quality as a key part of data integration projects.

In an announcement Monday, the company claims that its Open Profiler application is the first open-source data profiler to be released to the marketplace.

A data profiler allows users to clean up data by getting rid of multiple entries that might be slightly different, as well as resolving conflicting data such as missing zip codes, incomplete addresses or wrong phone numbers that can lead to multiple mailings to the same customer, the company said.

Yves de Montcheuil, vice president of marketing for Talend, said the company built an open-source profiler to fill a void in the marketplace. For data-intensive businesses, an open-source profiler allows a company to more easily customize and modify the code to meet its own needs, compared to proprietary products. And because it is free to download and use, "You can start looking at this without having a budget and see how it works," he said.

Talend will release related data cleansing products later this US summer, he said. "Our customers doing this integration need that data quality" provided by a data-profiling application, he said. "If you do that integration without knowing what you have, it's like driving blind in the snow."

David Loshin, principal analyst at Knowledge Integrity, said Talend's new application is aimed at what is becoming an increasingly popular niche in data integration work. Many larger companies, including Informatica, IBM, Business Objects and Oracle, have been acquiring data-profiling vendors or built their own profilers in the past few years, he said.

"It's about time that we're getting some activity in the open-source community with respect to the kinds of tools they're putting out," Loshin said. "It is a boon to the data community to have access to an open-source data profiler."

Data profiling is an empirical analysis of a data set, relying on frequency distribution analysis for anomalies and validation of data, and looking for patterns, he said. "If you have a data set and don't know what's in there, you can profile it and learn more about what you have," he said, by highlighting anomalies, or errors, in the data. This helps data quality management because it enables the analyst to focus on what might be a deviation and then sort it.

Talend's Open Profiler "provides the initial piece of critical technology that anybody doing data integration needs," Loshin said. "Their long-term development plan looks to bring it up to snuff with best-in-class proprietary data profiling." Talend Open Profiler is available for free download under a GPL license at the company's Web site. Support is available under fee-based contracts.

Talend's core open-source data-integration application is Talend Open Studio. The vendor also offers Talend Integration Suite, a subscription-based service using Talend Open Studio, and Talend On Demand, a software-as-a-service (SaaS) open data integration product.

Comments

Post new comment

Login or register to link comments to your user profile, or you may also post a comment without being logged in.
The content of this field is kept private and will not be shown publicly.
Enter the fully qualified URL, eg. http://www.example.com/
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

Zones
Zone logoZones provide focussed content from Computerworld and leading technology partners.
Newsletter Subscription
Newsletter Subscription
Sign up for our Computerworld newsletters!
Syndicate content
 

Computerworld Webinar

Thursday, June 11th, 2009
10:30am EST (Sydney, Australia)
Screening at your PC

Computerworld is hosting a 30 minute live webinar to help you to learn how unified communications can save you money, foster innovation and business agility by making it easier for people to find, reach and collaborate with one another.

Register Now

Computerworld Community Comments
Whitepaper

Keeping your SQL Server Going 24x7

The SQL Server is the vital link between corporate data and enterprise applications. With compliance and regulatory implications, as well as business disruption, keeping data up-to-date and flowing 24x7 has to be the goal. Keep your SQL server going - read more now.

Enterprise IT Buyer's Guide
Find Technology Vendors Fast
 
Find vendors by name | Find by category
Sponsored Links
 
Send Us E-mail | Privacy Policy
Features List | Media Kit | Advertising | Contact Us

Copyright 2009 IDG Communications. ABN 14 001 592 650. All rights reserved.
Reproduction in whole or in part in any form or medium without express written permission of IDG Communications is prohibited.