FAQ: Why is enterprise search harder than Google Web search?

Where format complications meet inflated user expectations

More than a few eyebrows were raised in early January when Microsoft said it would spend US$1.2 billion in cash to buy enterprise search provider Fast Search & Transfer ASA. But Jeffrey Raikes, then Microsoft Business Division president, had to also go and claim that FAST is better than Google when it comes to searching "behind the firewall."

Computerworld decided to investigate that bold claim and to answer all of the other questions that have popped up in our brains since the challenge.

What exactly does enterprise search do?

Enterprise search software helps company employees find information stored in their corporate networks and PCs in whatever form it's in -- documents, e-mails, spreadsheets, internal Web pages and so forth. Imagine something like Google Desktop or Windows Desktop Search, but indexing an entire company's worth of content.

Large relational database vendors have long argued that stuffing as many of your documents as possible into a database is the way to go. Hence, the ongoing war of words between Oracle and IBM over whose database software provides faster storage and retrieval of XML data.

But enterprise search software such as Fast, Autonomy or Endeca Technologies lets you go the other way and search for information in a database, either in unstructured binary large object or "Blob" form, or if it's numbers, even in cells.

Search software is actually faster than executing a SQL run to find data in a database, though it can't manipulate or numerically analyze the data, according to Yves Schabes, co-founder and president of Teragram, an enterprise search vendor.

If I can use Google, can I easily learn to use enterprise search software?

Probably. Most software today displays a single initial box into which a user can enter keywords separated by Boolean logic commands such as AND and OR. After getting a set of results, users then look to the side for drop-down menus where they can narrow the search down by what Schabes calls "facets" such as information source, by country or by date.

What kinds of information are does search software have difficulty finding?

Enterprise search software tends to be bad at searching information that has already been offloaded to tape archives, according to Schabes. For that, companies still tend to rely on specialized e-discovery and storage management tools.

Enterprise search also has problems handling multimedia such as podcasts, pictures and video files. Metadata is usually scarce or not useful. Those files still need to be transcribed or processed by speech-to-text software to be indexable by enterprise search software.

In addition, enterprise search software isn't good at filtering out multiple versions of the same document, Schabes says. This data cleansing, data de-duplication or master data management is already an established field in the structured relational database realm. But tools are slow to emerge in the unstructured enterprise search arena, he says.

Nothing you've said sounds like enterprise search would be more difficult than the task of cataloging and searching the entire World Wide Web.

But consider the challenges of looking in every nook and cranny of a corporate network, reading all the various file formats, and handling who has permission to see what. For instance, an enterprise search product might index everyone's private e-mail. But only certain employees should be allowed to search those e-mails -- in fact, those e-mails should not even appear in the search results of unauthorized employees, Schabe says. To enforce that, enterprise search software needs to be tied into group-policy software such as Microsoft's Active Directory -- no easy task.

Moreover, corporate documents lack useful metadata to help give context, claims Schabe. "People rarely search for the author of a document, or what business unit it came from, or the date it was created," he says.

More about: AMR Research, ASA, Billion, EMC, Fast Search & Transfer, Google, IBM, Microsoft, Oracle, Software Today
References show all

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
Users posting comments agree to the Computerworld comments policy.
Login or register to link comments to your user profile, or you may also post a comment without being logged in.
Related Whitepapers
Latest Stories
Community Comments
Whitepapers
All whitepapers
Sign up now to get free exclusive access to reports, research and invitation only events.
Featured Download
/downloads/product/161/softdisc/

SoftDisc

SoftDisc is an image file tool that allows you to create, edit and manage your image files. It also lets you emulate a virtual CD ...

Computerworld newsletter

Join the most dedicated community for IT managers, leaders and professionals in Australia