Computerworld

Find a successful search strategy

Text-search technology is finally coming of age. Stand-alone search "solutions" can still be ludicrously bad: Monsanto's top 10 hits for "fertilizer" lead to boilerplate legal disclaimers, The Boeing Co.'s top 10 hits for "jet fighter" feature discontinued aircraft, and Toyota's top 10 hits for "Camry" are in Chinese. But when text search is integrated into a broader application framework, the story is much happier. Search is a clear success in several specific niches, such as online retailing or regulation-driven document management. And as the means improve of integrating text with other kinds of data, a much broader range of text-aware applications is becoming practical as well.

Some of the more intriguing opportunities include:

Upgrading your Web presence: Surely you already have a search capability on your Web site. But it could probably be a lot better.

Upgrading your online documentation: If you provide a lot of technical information online, it's probably hard to navigate. Helping customers find what they need more easily can save both them and you a lot of money. Effective text search is crucial in this effort.

Navigating applications more easily: Large e-commerce sites are often best navigated via text queries such as "John Grisham" or "red turtleneck sweaters." The same may be true for back-office systems such as merchandising or purchasing.

Helping your people find one another: Suppose an employee searches for information on a subject and finds vaguely relevant information written by a colleague. There's a good chance that talking with this person will help your employee find out what he needs to know. Not only is such expertise-finding invaluable in global engineering and consulting organizations, but it also can be helpful in figuring out how to approach particularly tricky or important sales challenges.

Digging into text mining: You probably have a wealth of text and even voice records pertaining to customer contacts -- service call reports, call center reports, sales call reports, customer letters and e-mails, even recorded phone calls or chat sessions. Analyzing these could turn up crucial information about customer segmentation or about product strengths, weaknesses and flaws. It's a gamble, because you don't know how much you'll really find -- but just like the original form of data mining, it's a gamble worth taking.

All of these application scenarios depend on text being related to other kinds of data. In unaugmented text search, documents are searched for words and phrases, which are then used to assess the subject of a document. However, linguistic techniques alone aren't enough to produce satisfactory results. This is why pre-Google search engines all failed; until Google Inc. came up with an effective way to use extra, nonlinguistic information, the Web search relevancy problem simply couldn't be solved.

Google's solution -- looking at a page's "link popularity" -- isn't applicable to most corporate search environments. However, enterprises have access to plenty of other extratextual information. Documents can be tagged by date, author, subject and, above all, intended audience and purpose. Customer communications can be associated with tremendous amounts of customer and product data. Most of this information is best stored and communicated via SQL, Lightweight Directory Access Protocol (LDAP) or XML.

The key point of integration between text and these other kinds of data is a full-featured database management system. IBM Corp. and Oracle Corp. both support "WHERE CONTAINS" syntax, letting text searches and normal relational queries be joined in a single SQL statement. And since SQL systems these days can also talk XML and LDAP, those integrations are provided as well. Text-specific features are still missing from application development tools, but that's not crucial. Since relational database management systems stuff an entire document into a binary large object field, generic SQL building technology is usually all a programmer needs.

What's trickier is integrated administration of text and tabular data. Text search relies on specialized text indices, which are huge, sparse and generally a lot like bit maps. Integrating text (or bit map) indices into relational databases is far from trivial.

Fortunately, however, Oracle and IBM are pretty far along in text/relational integration, with Oracle being somewhat ahead of IBM. Microsoft Corp., which has long lagged behind, pledges to narrow the gap soon.

So should you incorporate text search in the applications you build or buy? For most enterprises, the answer is yes. Text data is obviously pervasive and important. The cost of integrating text with other data types is manageable. Text-search boxes are a major form of user interface. If you don't have a text strategy, you're probably not getting the most out of your IT opportunity.