SAN MATEO (04/18/2000) - Consider the amount of information your company has generated over its lifetime, and the many formats in which that information resides. Now consider having to sift through the backlog of data to encode every page in an effort to build a unified, searchable database for customers and employees.
That task is similar to one faced by the National Library of Medicine (NLM), which, as you can imagine, had a fair amount of information to handle.
"We're not talking about 200-word abstracts. Our largest book is 1,000 printed pages," says Maureen Prettyman, computer specialist and project leader at the NLM, in Bethesda, Maryland. "We're also talking about consumer pamphlets that were anywhere from two to 25 pages and clinical reference guides that are around 600 pages. That's a morass of paper to get through."
The NLM, a federally funded library under the auspice of the National Institute of Health, gathers a variety of government-sponsored medical information into several databases and publishes that information to the Web, creating what Prettyman describes as the "last resort for information" for medical professionals as well as average citizens.
In 1990, when a Congressional panel first required the information to be made available electronically in full-text form, the Agency for Health Care Policy and Research -- one of the library's largest data providers -- handed NLM "a lump of money" to set about the task of encoding its information.
It quickly became apparent that the task was too daunting for NLM's staff to handle alone.
"If you are building a project like this from scratch, there's a real learning curve to it, so your manpower costs are going to be substantial," Prettyman says. "At first, I was doing most of the encoding myself, writing programs to automate as much as I could and getting our other programmers to help. But we were getting the data in so many formats that we realized we couldn't worry about all the details."
Instead of hiring an army of programmers, Prettyman chose to outsource the task to Data Conversion Laboratory (DCL), in Fresh Meadows, New York. That didn't mean the project went away; but it was, Prettyman says, a fairly painless way of catching up on the backlog, and DCL has been an integral partner in keeping the databases updated.
"It has been our hoped-for policy to make a document available on the Web the same day it is announced to the public as a printed version," Prettyman says.
"That means as little as a six-to eight-day turnaround for documents as large as 800 pages."
Due to DCL's familiarity with the library's systems, that hasn't been a problem.
Outsourcing the library's data conversion has freed Prettyman and her staff for other projects, notably designing a new Web interface, making the transition to an object-oriented database that will improve performance and provide a better way to update text, and building tools so that data providers can submit SGML-encoded data directly to the database.
The results speak for themselves, even if money is not the measure.
"In our case it has nothing to do with making money but with making the best and most current information available. It's a service," Prettyman says. "The first three years saw a significant jump in the number of hits, and there has been a steady incline since then -- up over 2 million a month now."