Enterprise Toolbox

Chances are you're probably already analyzing the information housed in your company's relational databases. Doing so can provide you with vital statistics regarding sales, inventory, customer buying trends, and much more.

But what of the data housed outside of your relational databases? We face a serious glut of information today, and unstructured data forms a large part of your corporate information assets.

Documents on your intranet, e-mail that contains internal research, and groupware applications that house customer comments are all examples of unstructured data sources you may already have. Moreover, your operations may also need to tap external, nonrelational data.

Enter text-mining technology -- part of the knowledge management category. Text mining is similar to data mining; both uncover information relationships.

However, text mining focuses on nonrelational data --whether inside or outside of your company.

The idea behind text mining is not new; many companies have used it for some time as the technology that drives custom data-analysis solutions. The Gartner Group estimates that companies have already spent nearly $1.5 billion on consulting services to tap into a wider array of corporate data, and the figure is expected to jump to $5 billion by 2001. However, the market for text-mining tools is also expected to expand greatly in the next several months.

Text-mining tools are ideal for information "discovery." The cost and types of text-mining tools vary greatly today because the general-use market is still emerging, but they usually include analysis and search facilities. As one who is always discovering business and technology data, I believe text mining will be a positive investment for those who employ it.

What can you expect to gain from implementing text mining? Obviously, you'll see an increase in the value of your existing data assets. Your staff will be able to quickly draw upon a much wider assortment of information, which should help increase your productivity. Moreover, you can potentially step ahead of your competition by having more complete information to proactively make better-informed decisions.

Because the text-mining tool market is still evolving, those evaluating the technology will need to consider a few key elements before making a purchase.

Chief among these should be choosing a tool that does not require a huge first-time categorization, tagging, or integration effort.

The tool you select should be capable of automatically identifying and indexing unstructured data concepts. You should also expect some form of graphical interface that supports a high-level view of the data as well as the capability to drill down to a very detailed level. At this time, the types of interfaces differ by product.

If you want to get a feel for one type of high-level interface and its drill-down capabilities, check out the text-mining demonstration of WEBSOM (a Web Self-Organizing Map), which brings together more than a million documents from 80 Internet newsgroups (see websom.hut.fi/websom/milliondemo/html/root.html).

Text-mining tools are not supposed to be static in nature, given the changing data involved. The tool you select should let users explore data paths that form new relationships. And, the facilities should support ongoing mining activities.

You should also expect that a text-mining solution should integrate with your collaboration tools. After all, much of your unstructured data gems are actually stored inside the heads of many employees. Moreover, you should be able to tap internal and external data -- only some of the tools now available support the latter.

The types of unstructured data formats supported by text-mining tools also vary greatly. You'll want to carefully inspect which formats are supported (such as text, spreadsheets, graphics, presentations, and compressed files). If you're careful, you can locate text-mining tools that are also capable of mining relational data formats.

Some of the text-mining tools available now include Semio's SemioMap (www.semio.com), IBM's Intelligent Mining for Text (www.ibm.com/software/data/iminer/fortext), Autonomy's Agentware (www.autonomy.com), and Megaputer's TextAnalyst (www.megaputer.com).

Given the immaturity of the text-mining tool market, each of these tools takes a slightly different track. For example, SemioMap offers server-side components that extract and index text, cluster concepts, and visual tools to navigate the unstructured data. By contrast, IBM's offering is in the form of a software development kit with sophisticated text analysis and search tools. I expect you'll see many other tools arriving or being enhanced during the coming year.

Is text mining for everyone? You certainly would not use the technology to locate a single piece of information. Text mining is ideal for finding related information in huge volumes of unstructured data. It is also a good way to learn about a subject, inspect changes in the market, or to identify ideas to pursue.

Will you tap into the hidden assets buried in your unstructured data? Write to me at maggie_biggs@infoworld.com.

Maggie Biggs is InfoWorld Test Center's technical director.

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

More about GartnerGartnerIBM AustraliaMegaputerSemio

Show Comments