For thousands of years, the primary way humans record and exchange information has remained the same: we've used language. Pictures and numbers are important too, but the history of humankind is mostly written in words.
Businesses and other enterprises are no exception to this rule. Customer communication is done through words. Internal communication is done through words. Information is gathered largely through words. Meetings, phone calls, e-mail, instant messaging, reading on the Web -- taken together, those play a much larger role in most knowledge workers' lives than numbers, graphics or routine transactions.
So do enterprises have comprehensive strategies for managing all these words? No way. You can hardly be in business these days unless you have sophisticated ways to manage, understand, analyze, exploit and add to structured data. But technology and business strategies for dealing with textual data are vastly more primitive. And the situation is worse yet for voice information (which is basically like text, except that it's dirty and highly uncompressed).
To be sure, there have been some beginnings. Search, text indexing and/or what generally passes for knowledge management have become must-haves at many companies, as has content management. Storage strategies are increasingly taking account of the non updatable nature of most documents. Text mining is a small but fast-growing market with a number of hot subsectors.
Procter & Gamble even has one computational linguist among its 120,000-plus employees and may hire a second next year. Yet vastly more work is needed, and the time has come for a comprehensive roadmap towards acquiring a management strategy for linguistic content.
But before drawing such a map, we should get a clearer view of our destination. What should a business hope to accomplish through the electronic management of text? A good core list might be as follows:
Search and document-finding: For many people, text management begins and ends with search. But even if you believe that, there are many kinds of search, each with its own difficulties. You want to present information to customers, partners and employees. Some of it you really want to press on them for marketing purposes, whether or not it's exactly what they most want to see. In other cases, you want to help somebody find a needle in a haystack -- without being sure that the needle even exists in the first place. Document formats range from Web pages to e-mail to Office output and keep going from there; modern enterprise search tools typically recognize well over 100 file types.
To slice it another way, search is needed in almost every department and function: sales, Web marketing, engineering, recruiting and employee-facing human resources to name just a few. But the requirements and challenges of each of those applications can be so different as to require a whole different set of technologies and business processes to make them work.
Object-finding: Documents aren't the only targets for search engines. Pictures, video, audio, structured records -- all can be found through text search. Text search for structured records is particularly important. It's central to online catalogue sales, and it's also crucial to a number of compliance initiatives that are hastening the adoption of text technology.
Knowledge (re)discovery: Search technology and its use may be primitive, but most enterprises are at least making efforts in those areas. Knowledge extraction and text mining are equally important, however, and many companies haven't even begun to consider them. Much like search, text mining can be applied to almost any area of a business -- customer communication, marketing analysis, engineering, HR and plenty more. And the same goes for more targetted, less statistical forms of information extraction as well.
Regulatory compliance: One group of text applications deserves particular attention: those that can help you obey the law. Regulatory mandates for text commonly come in two kinds. One is simply document collection and retention, to be done as completely, securely, cost-effectively and search-friendly as possible. The other is risk monitoring. Whether they're checking for internal fraud (for the Sarbanes-Oxley Act), or drug side effects, companies increasingly have the legal duty to act upon signs of trouble, no matter what form those signs first appear in. And without text analytics, even recognizing those signs in the first place is an expensive manual process.