More than a few eyebrows were raised in early January when Microsoft said it would spend US$1.2 billion in cash to buy enterprise search provider Fast Search & Transfer ASA. But Jeffrey Raikes, then Microsoft Business Division president, had to also go and claim that FAST is better than Google when it comes to searching "behind the firewall."
Computerworld decided to investigate that bold claim and to answer all of the other questions that have popped up in our brains since the challenge.
What exactly does enterprise search do?
Enterprise search software helps company employees find information stored in their corporate networks and PCs in whatever form it's in -- documents, e-mails, spreadsheets, internal Web pages and so forth. Imagine something like Google Desktop or Windows Desktop Search, but indexing an entire company's worth of content.
Large relational database vendors have long argued that stuffing as many of your documents as possible into a database is the way to go. Hence, the ongoing war of words between Oracle and IBM over whose database software provides faster storage and retrieval of XML data.
But enterprise search software such as Fast, Autonomy or Endeca Technologies lets you go the other way and search for information in a database, either in unstructured binary large object or "Blob" form, or if it's numbers, even in cells.
Search software is actually faster than executing a SQL run to find data in a database, though it can't manipulate or numerically analyze the data, according to Yves Schabes, co-founder and president of Teragram, an enterprise search vendor.
If I can use Google, can I easily learn to use enterprise search software?
Probably. Most software today displays a single initial box into which a user can enter keywords separated by Boolean logic commands such as AND and OR. After getting a set of results, users then look to the side for drop-down menus where they can narrow the search down by what Schabes calls "facets" such as information source, by country or by date.
What kinds of information are does search software have difficulty finding?
Enterprise search software tends to be bad at searching information that has already been offloaded to tape archives, according to Schabes. For that, companies still tend to rely on specialized e-discovery and storage management tools.
Enterprise search also has problems handling multimedia such as podcasts, pictures and video files. Metadata is usually scarce or not useful. Those files still need to be transcribed or processed by speech-to-text software to be indexable by enterprise search software.
In addition, enterprise search software isn't good at filtering out multiple versions of the same document, Schabes says. This data cleansing, data de-duplication or master data management is already an established field in the structured relational database realm. But tools are slow to emerge in the unstructured enterprise search arena, he says.
Nothing you've said sounds like enterprise search would be more difficult than the task of cataloging and searching the entire World Wide Web.
But consider the challenges of looking in every nook and cranny of a corporate network, reading all the various file formats, and handling who has permission to see what. For instance, an enterprise search product might index everyone's private e-mail. But only certain employees should be allowed to search those e-mails -- in fact, those e-mails should not even appear in the search results of unauthorized employees, Schabe says. To enforce that, enterprise search software needs to be tied into group-policy software such as Microsoft's Active Directory -- no easy task.
Moreover, corporate documents lack useful metadata to help give context, claims Schabe. "People rarely search for the author of a document, or what business unit it came from, or the date it was created," he says.