Eight years ago, there were plenty of tools to search and analyze structured data, and even a few to go after unstructured information such as free-form text. But the two kinds of tools were not integrated, according to Jeffrey Kreulen, senior manager of service-oriented technologies at IBM's Almaden Research Center in San Jose. And the most sophisticated analytic tools used esoteric mathematical techniques that pretty much kept them out of the hands of nontechnical users.
The IBM lab is now into the third generation of tools to address those limitations. In 1998, it developed a prototype called eClassifier, basically a collection of algorithms for mining call center data. It was used internally at IBM to answer questions such as, "What are my top 10 problems?"
The idea, says Kreulen by way of example, was that if 10 percent of all calls dealt with password-reset issues, 10 percent of calls might be eliminated by automating the password-reset task.
But eClassifier was limited in both usefulness and usability, and it was replaced by a second-generation tool called BIKM, for business intelligence/ knowledge management. The tool was designed to find a way to go after both structured and unstructured information simultaneously.
The BI part focused on transactional kinds of data, such as financial records, while the KM part dealt with the kinds of unstructured text that can run into the petabytes at many companies, Kreulen says.
But BIKM was primarily about search and retrieval. "The next phase is what BIW is about," Kreulen says of the Horizon-award-winning Business Insights Workbench. "It's about how to do analytics on top of that; how to create actionable insights for your business that search just can't do."
These insights are enabled by human expertise built into BIW in the form of taxonomies -- natural classifications of data that emerge from clustering algorithms. A user might start with his own taxonomies, such as what he believes are the top 10 reasons for customer calls, but then BIW can refine and improve those by a process called machine learning. More powerful machine learning techniques are a focus of research today at Almaden, Kreulen says.
The other major improvement in this third-generation tool is that it is more accessible to lay users, Kreulen says. "We are striving for a broader audience; we don't want our users to have to be Ph.Ds," he says.
"A great deal of information is lost because it is buried in unstructured data that is difficult to mine," says Joe Drouin, CIO at TRW Automotive Holdings, and one of this year's Horizon judges. "Finding those critical nuggets of information and presenting them in a way to enable better decision-making is a daunting task.
"IBM's concept for pulling this all together and offering a set of integrated tools, on top of a framework for aggregating and consolidating structured and unstructured data from a wide variety of sources, holds a great deal of promise," Drouin adds.
BIW isn't a product but is part of the tool kit that IBM carries on consulting engagements. It is being used in call centers, where service representatives enter structured information as coded values, as well as unstructured, free-form text such as comments and problem descriptions. "Often, the unstructured information is a lot more valuable than the structured information," Kreulen observes.
A more futuristic application of BIW would be to mine and analyze e-mail messages, either those of employees or customers, in support of risk assessment and compliance functions, Kreulen says.