Editor's Note: Chosen by a panel of Computerworld editors, these organizations are making better business decisions and, in some cases, generating new revenue streams and tapping into new markets. Read on for the successes as well as the new technologies that are driving the pace in the big-data field.
An enhanced search engine brings 100 million instantly searchable documents to R&D teams.
Pharmaceutical research and development projects require collaboration among members of huge distributed R&D organizations with hundreds of millions of documents to parse. Facing such a challenge, AstraZeneca needed a big data system that would foster collaboration among teams, eliminate redundancy, help researchers zero in on relevant data and shorten research times.
"If we can save every user four minutes a day in searches, that equates to [time spent by] 85 people per year that we can save," says Nick Brown, technology services director.
AstraZeneca deployed the Sinequa Big Data Search and Analytics Platform in the cloud and now provides end users with 100 million unduplicated documents that are instantly searchable. The platform combines external medical data -- including publications, patents, clinical trials and research grants -- with internal content found on departmental file shares, in Microsoft SharePoint, SharePoint Online and Office 365 systems, and via EMC's eRoom collaboration system and Documentum content management system.
Built-in text-mining algorithms let users search documents at the sentence or phrase level. Users get information at 1/50th the volume of a document-level search, and each document is highly accurate, Brown adds.
One of the challenges was to simplify pharmaceutical vocabulary, so the team first indexed information and tagged it with a half-million terms in 15 to 20 categories and then deployed some smart algorithms that could, for example, tell when the term cat was being used to refer to a feline animal and when it was being used to refer to a CT scan. Brown uses technology from SciBite, a startup in Bexhill, England, to add rule-based logic and accurate scientific vocabularies to the platform. The system had been rolled out to 1,000 users by December 2013; it now supports 8,000 researchers.