Refining enterprise search

Anyone who has been transfixed by a gymnast or a figure skater knows that the magic happens when they perform flawlessly and yet make it seem easy. That's how a search should work: Enter a query, and the right results appear in simple, elegant fashion -- even if it took countless hours of preparation to make the magic possible.

Yet most enterprise users still stumble as they try to extract data from multiple repositories, each with its own search engine. Enterprises seem awash in a rising tide of structured and unstructured data. And even though users are often forced to tag documents manually across various content management systems in hopes that those documents will be easier to retrieve, searches still yield a surfeit of irrelevant, time-wasting results.

ESPs (enterprise search platforms) are on a mission to change all that. These new, comprehensive bundles of search and integration technologies unlock information tucked away in data stores across the enterprise. The goal of ESPs is deceptively simple: to take fairly simple queries and return the most relevant results possible, all in one place. But under the hood, ESPs aggregate a host of emerging technologies such as autocategorization, entity extraction, and NLP (natural language processing). With an ESP as a foundation, businesses can build customized search applications while automating the process of preparing documents for archiving and indexing.

"The building blocks are converging so that you don't have to cobble together all the pieces yourself," observes Susan Feldman, vice president of content technology research at IDC. These advanced search platforms establish sophisticated gateways to silos of information -- even those with their own search engines. ESPs also provide a common set of data and search logic that can be tuned on an application-by-application basis to improve the relevance of search results.

IBM last month came out swinging with its DB2 Information Integrator, code-named Masala, which contains an advanced search engine designed to complement the company's other heavy hitters in the content management arena, DB2 Content Manager and WebFountain. With Masala, IBM joins the ranks of Autonomy, Convera, EasyAsk, Endeca Technologies, Fast Search & Transfer (FAST), iPhrase Technologies, and Verity, each of which offers search-application platforms with a different mix of features.

Breaking down the walls

ESPs are transforming the way the enterprise conducts a federated search, the process by which a single query is passed to multiple search engines and the user is presented with aggregated results. A federated search can augment searches of similar data stores but loses traction when it runs up against external databases that require specific syntax.

Basic federated search, which has been in existence for years, "doesn't protect the user from another kind of infoglut -- getting irrelevant results from multiple search engines instead of just one," observes Hadley Reynolds, vice president and director of research at Delphi Group. "Without some additional sense-making, it's a blunt instrument."

Compounding matters, enterprises typically have multiple search engines embedded in various applications -- for instance, one in a content management system, one in the Microsoft Office environment, and another in an e-mail program. The ESP transcends these search-engine silos and corresponding data repositories and imposes syntax translation and other linguistic manipulations, such as spell-check and phrase detection, on the query prior to crawling the data stores.

At the indexing layer, the ESP aids the user by returning lists of improved query choices based on the context of the original, sometimes vague, query. Take FAST's ESP, which powers the public-facing Scirus.com. If you type the word "nuclear" in an effort to retrieve published science-journal entries related to that topic, the keyword will reap more than 700,000 returns. A refined keyword search selected from the list of suggestions on the right-hand side of the page -- "nuclear facility" -- whittles that to approximately 1,000. Click once more, on "uranium enrichment," and you're down to about 10.

Endeca offers a technology that combines search with what it calls Guided Navigation. Here, a keyword search generates a search directory on the fly, which users can employ to drill down to progressively refined results.

Customized tune-ups

According to Delphi Group's Reynolds, creating an effective search interface for the enterprise user involves "knowledge-driven search applications" tailored to the business domain of the staffer.

"In order to achieve real accuracy, the search software has to be tuned to understand the context in which I'm working," Reynolds says. "It's a business-process-centered development strategy, so that you're looking at a platform from the perspective of its ability to be tailored to specific users."

Reynolds adds that Autonomy and FAST already prepackage offerings in the compliance, call center, market intelligence, and financial arenas. Verity offers multiple application templates as well. With this kind of tailored search interface, when financial brokers type "bonds" into a query, they never have to set eyes on a document related to glue.

FAST's Marketrac layers an application on top of the ESP, which amounts to a search-powered interface that can access e-mail content, news feeds, competitor's Web site content, and database content in a CRM. Moreover, the platform's categorization facilities enable knowledge workers to explore content through patterns of meaning or subject matter.

Meanwhile, Google is taking a different approach with its enterprise offering, the Google Search Appliance. It puts behind the firewall much of the successful technology that powers its public product, taking plug and play to new heights. In other words, the appliance is basically a search engine, not a comprehensive platform.

Dave Girouard, general manager for enterprise search at Google, cautions that ESPs "are putting a bigger burden on the user. As long as the results show up in the first page, (users) don't care what's behind it. ... We have the right relevancy algorithms. So, in terms of (too much) content, we're saying, 'Bring it on.' "

The Google appliance may save the day for enterprises with broken search technology: Just open up the repositories and rev up the Google engine. But Delphi Group's Reynolds thinks that "IT should stop investing in generic search tools and start concentrating on their professional domains. At the same time, the business side should be more involved, to ensure that IT commits the resources to develop business-oriented applications of search."

Andrew McKay, vice president of direct sales at FAST, agrees but adds that vendors "aren't necessarily fighting over a percentage of the pie. It's about making the pie dramatically larger," as information stores expand exponentially.

It's all in the pipeline

For years, businesses have been fighting to get searches of unstructured data -- information that resides outside enterprise applications and databases -- to achieve the kind of accuracy and precision expected with structured data. According to Delphi Group's Reynolds, with ESPs, the search-indexing process for unstructured information is evolving into a pipeline of different search algorithms and advanced technologies. These allow for dynamic categorizations or targeted text analytics to take place within the processes that parse documents when they come into the search platform, and within the processes that evaluate queries and return relevant information.

A relatively new addition to the pipeline is entity extraction, in which a search engine dynamically extracts terms from indexed content on the fly through grammatical analysis. The process includes identifying proper nouns and creating a list of people, places, and things from a document and then inserting a new level of metadata into that document.

Another is the use of NLP, which helps turn poor queries into good ones. The state of the art in search platforms involves a wide range of algorithms, rules, data enhancements, user- and context-profiling -- all of which work together to help zero in on what users need to answer their questions.

As for metadata, the old way of manually defining properties of a document is waning in favor of an intelligent search platform's capability of autotagging based on users' "custom logic," according to FAST's McKay.

ESPs can discover patterns in the content and enhance the value of that content within the search platform infrastructure by automatically creating metadata elements. Thanks to the exponential spread of XML across search environments, this metadata can then be used for a wide range of application processing, query enhancements, and presentation options.

Enhanced classification and taxonomy come into play by enabling users to browse information by subject area rather than relying solely on the blank search field and their capability of constructing an effective query. Dynamic classification capabilities can modify the presentation of subject areas based on the query's context.

These new technologies "allow you to cross the structured and unstructured worlds," says Pete Bell, co-founder of Endeca.

To make unstructured data more meaningful, Verity is taking several approaches. Its newly introduced Extractor automatically preprocesses documents, looking for concepts, patterns, entities, and tags files, accordingly. At the next level, its Collaborative Classifier enlists a broad range of subject-matter experts within the organization to manage topics. It's highly intuitive and encourages user participation, which, in turn, significantly boosts categorization accuracy, according to company officials.

End to end with security in mind

Although the line between consumer search and enterprise search continually blurs, a key difference lies in enterprise security architecture.

"Security is a huge issue because you don't want to show results that include documents to which the user has no right," IDC's Feldman says, asserting, however, that security at the platform layer is fairly straightforward. "If you've got document-level security and repository-level security, search engines can use them to index documents for access rights. They can also tie into an LDAP directory to look at the collection-level access rights."

John McPherson, a distinguished engineer at IBM, explains that the search engine within DB2 Information Integrator is adept at integrating assigned permissions and maintaining the security of the data from the underlying repository.

"There are associated security tokens at the document level, and an interface allows the application to do a search on behalf of the user with specified security credentials, which guarantee we're only returning content that the user is allowed to see," McPherson says. "It's integrated way down in the index so we're also getting peak performance."

Delphi Group's Reynolds echoes a prevailing sentiment: "The search environment has no business imposing a specific security scheme on the enterprise. You want it to be flexible and agnostic."

Simplicity wrapped in complexity

For that matter, users are typically agnostic in that few workers sit around questioning how results are returned.

To be of value, search vendors must provide "a single user experience that hides the fact that there are different engines, different indexes, and different capabilities happening in the background," notes Laura Ramos, vice president of research at Forrester Research.

But ESPs demand that they get acquainted with more intelligent search methods. According to IDC's Feldman, the blank query field and three-word search is gradually going away, as ESPs forge new interfaces. Search platforms "must be tied into the collaborative tools of the organization," she says.

Mike Heck contributed to this article.

Richard Gincel is an associate editor at InfoWorld.

Simple advice for complex search solutions

When they work as designed, search applications are wonderful, delivering up-to-date information that helps avoid faulty decisions. But getting your search infrastructure tuned to this point takes forethought and precise execution. Experts offer advice for putting your enterprise search on the right course.

Maximize options. Searches are likely to miss the mark if users rely solely on Web sites. Employees should also have access to valuable content in an array of databases, enterprise applications, document libraries, e-mail servers' public folders, file servers, and discussion groups.

Make the search field prominent. Make it easy to search from anywhere on your intranet and public Web site. Put a search box -- or at least a search link -- in a prominent spot on every page.

Keep it simple. Keep your search page clean and inviting. Also, make sure you include visual clues, such as multiline fields, to inform users that they can type in more than a few keywords. Similarly, the results pages should limit extraneous images and links, while offering clear clues to allow users to quickly switch results to different formats, such as navigation.

Streamline and automate. Information is of little value locked in employees' heads. So, make the process of publishing documents to portals as easy as saving to the desktop. Then, eliminate as many follow-up steps as possible.

Be protected. Don't be timid about crawling secure Web content, restricted databases, and premium services such as LexisNexis and Factiva. But the content provider should provide the proper authentication capabilities to limit protected content to authorized users.

Consider SSO (single sign-on). An SSO architecture allows users to investigate the content of all online information sources with a single query. The time savings can add up.

Build for speed. Not only should results of protected content be returned fast, but overall search response needs to be swift. This keeps users happy and encourages visitors to return to your site more often.

Encourage feedback. Finally, don't assume your search is functioning at its greatest potential based on limited testing. Solicit feedback from end-users. Find out what they like about your search implementation and what could be improved, and act on the comments when appropriate.

Join the newsletter!

Error: Please check your email address.

More about ConveraDelphi AustraliaFactivaFast Search & TransferForrester ResearchGoogleIBM AustraliaIDC AustraliaiPhrase TechnologiesMicrosoftTechnology ResearchVerity

Show Comments

Market Place