Large organisations that have built up vast stores of data are finding that having information doesn't necessarily mean being able to use it, or even knowing it exists.

Companies that live and die based on their ability to create innovative technologies, such as new types of aircraft engines, must track all sorts of things, from patent applications to research databases to their own discoveries.

Or consider universities, which catalogue every piece of paper, from parents' letters to class registration forms, but must somehow get quick, aggregate views of a particular student's complete record.

Even though relevant information is in some database somewhere, people don't always find it or benefit from it.

"Occasionally, in our company, we solve the same problem twice or even three times, because people aren't aware of the solutions that have been arrived at before," says John Taylor, manager of research and development at the Aerostructures Group, part of the aerospace division of The BFGoodrich Co. The aerospace division is one of the world's largest suppliers of aircraft systems and components for aviation and space markets.

In order to keep track of what they know, companies such as BFGoodrich are turning to new search technology. Tools designed for corporate searches can understand the context of a written sentence, as well as compensate for misspellings or bad grammar while searching through various types of files, e-mail programs or attachments and external databases.

Signal to noise

Taylor says BFGoodrich is several months into an evaluation of CoBrain from Invention Machine. The software is saving his researchers valuable time, he says, because it can semantically process the sentences in documents it encounters.

For example, BFGoodrich does a lot of anti-icing research and development because the nacelles (housings for external engines on an aircraft) on a wing are subject to icing. A typical keyword search for ice removal would generate "thousands or tens of thousands of hits on Web and patent pages," Taylor says, because the search would simply return every instance of ice or removal it found.

CoBrain, on the other hand, uses a synonym database to search for that information and provide fewer, highly qualified results.

"That's the goal: to give you the 20 or 30 ways you might remove ice from a nacelle," says Taylor.

BFGoodrich researchers must search patent databases, Federal Aviation Administration reports, internal e-mail messages and their attachments, presentations, white papers, external journals and conference notes and data from commissioned research.

Using keyword searching, "the signal-to-noise ratio is dreadful", says Taylor. Keyword searching turned up thousands of hits, which took hours to vet and slowed the scientists' research. Furthermore, any proposals by the scientists must await further research to ensure that the proposals won't infringe upon existing patents or that the proposed innovations haven't already been discovered by other BFGoodrich scientists. Hence, shortening search times means quickening the pace at which BFGoodrich scientists can develop innovative technology, says Taylor.

Semantic searching can be good for searching through disparate file types and information sources, but if an organisation's environment is more homogeneous, there are more elegant ways to comb through the information.

The University of Wisconsin-Madison has had a document management system from Cypress in place for some time. Cypress works by substituting its own virtual printer drivers for drivers supplied by printer manufacturers. When a user prints a document, the virtual driver makes an electronic copy and adds it to a document repository.

But any useful information that goes into the repository just sits there unless someone looks for it. So the university decided to use KnowledgeBase search software - also from Cypress - to automatically process information, literally cutting documents and forwarding relevant parts to users via e-mail. For instance, when a general budget is published, each department head receives the appropriate excerpt.

"You have to do some organising up front. You have to define your indexing before it goes into the system," says Jerry Gerber, assistant director of applications processing in the information technology division at the University of Wisconsin. So whenever a user creates a new document such as a report-card batch or a student's low-grade warning letter, he uses a template that the IT department created.

All relevant elements of the template are defined for the document management software, so the template can more easily index information later.

"You can do a keyword system, but that might provide you with information overload," says Gerber. Going beyond matching words to defining all variables in a document means that if an administrator needs to pull up all information on a student, he sees only what he needs. "Cypress allows me to maintain these documents online and simply bring up the information I need to find online," Gerber adds.

The University of California at Los Angeles also needed a way to quickly search administrative information. But first it had to get rid of all of the paper it used, says Jackie Reynolds, manager of campus services for the administrative information systems department at UCLA.

Before, all incoming documents were photocopied and sent to various officials, who then had to coordinate a solution or read the document and forward it to someone else. But as with the University of Wisconsin, UCLA had a fairly homogeneous information environment - officials knew precisely the kinds of documents they were going to track and search.

Three years ago, UCLA chose RetrievalWare software from Excalibur Technologies to fix its paper overflow problem.

"At the time, its searching capabilities were incredible and still are," Reynolds says of RetrievalWare.

Paper removal

But that wasn't why the university purchased the system. UCLA initially installed RetrievalWare to manage the mountain of documents the school receives - from letters from parents questioning their children's performances to requests for transfers and complaints - and the subsequent photocopies and routing slips. With so many documents, officials had been unable to clear the piles from their desks - or work away from their desks.

Now, every printed document received by the chancellor's office is scanned with optical character recognition open-source software from Kofax Image Products. The documents are then converted into electronic format and saved into RetrievalWare's document repository. Then, the software e-mails the appropriate officials to notify them that a document is queued for them.

Perhaps more important, RetrievalWare also allows the 200 administration officials and their staffers who use the software to quickly search a knowledge base of information.

If a parent writes a letter saying someone followed his child to the child's dormitory - and asking what the university is doing about security - the staff member who reads the document can immediately search the database for any similar occurrences, says Reynolds. Then, the staffer can forward all necessary information to the dean, the head of the dorm and campus security, while composing a reply to the parents.

A class by itself

Being able to recognise smudged sentences, misspelled words or bad grammar puts new search-engine technology in a class above keyword searching. Good search software can save a consulting team the task of rewording all its implementation notes so they have the same voice (sentences constructed in a certain manner).

But powerful search software is expensive. Taylor estimates that a full installation of CoBrain would cost a minimum of several million dollars.

"It's not a cheap product," he says, "but the potential benefit is huge." BFGoodrich is halfway through a six-month, full-site trial.

"Essentially, one good idea that comes out of the data this system produces could pay back that whole investment," Taylor adds.

