Sign up now to get free exclusive access to reports, research and invitation only events.
The amount of data in the world doubles every 18 months. Here's a look at eight real-world big data deployments in a variety of industries.
The amount of data in the world is increasing exponentially; it doubles every 18 months. There's much discussion of big data, both in terms of the problems it causes and the potential utility it represents. But some people are doing more than talking. Here are eight real-world big data deployments.
National Oceanic and Atmospheric Administration (NOAA) National Weather Service
NOAA has been in the big data business for 50 years. It now manages 30 petabytes of new data per year, collecting more than 3.5 billion observations per day from satellites, ships, aircraft, buoys and other sensors. It then uses direct measurement of atmospheric, oceanographic and terrestrial data together with complex, high-fidelity predictive modeling to provide the National Weather Service (NWS). NWS' models generate millions of products per day—weather warnings and guidance provided public and private sector forecasters, including government agencies like the Department of Defense and NASA.
AM Biotechnologies DNA Sequence Analysis Solution
Based in Houston, AM Biotechnologies is focused on developing a proprietary new technology for producing chemically modified, DNA-based molecular entities called aptamers. Aptamers have uses ranging from the diagnostic quantification of a particular analyte in a blood sample to the targeted delivery of drugs to specific sites in the body. Developing these aptamers requires analyzing up to tens of billions of short DNA sequences. It uses web-based big data analysis tools from CD-HIT and Galaxy to crunch its data.
NARA Electronic Records Archive
The National Archive and Records Administration (NARA) is the official record keeper of the U.S. It manages 142 TB (and growing) of information, which represents more than 7 billion objects, including records from across the federal agency ecosystem, Congress and several presidential libraries. The records that are digitized exist in more than 4,800 different formats. NARA is also in the process of digitizing more than four million cubic feet of traditional archival holdings. By 2016, 95 percent of the electronically archived information must be available to researchers. NARA has built the Electronic Records Archive (ERA) as a "system of systems" to perform the various archival functions and records management governed by different legal frameworks.
Vestas Wind Energy Turbine Placement and Maintenance
Danish firm Vestas uses supercomputers and a big data modeling solution to pinpoint the optimal location for its wind turbines to maximize power generation and reduce energy cost. It uses a wind library that incorporates data from global weather systems with data collected from its existing turbines. The wind library currently holds nearly 2.8 Petabytes of data. Current parameters include temperature, barometric pressure, humidity, precipitation, wind direction and velocity from the ground level up to 300 feet, and the company's recorded historical data. Vestas plans to add global deforestation metrics, satellite images, historical metrics, geospatial data and data on phases of the moon and tides.
IRS Compliance Data Warehouse
In 1996, The Internal Revenue Service (IRS) initiated a project to upload a single year of tax return data for analysis. The project has resulted in the Compliance Data Warehouse (CDW), which contains more than 1 Petabyte of information. Most of the legacy data is structured, but new data from electronically filed tax returns, international tax treaty partners and third parties come in XML or other semi/unstructured formats. The IRS research group runs analytics on the data for jobs ranging from estimating the U.S. tax gap to predicting identity theft, measuring the taxpayer burden and simulating the effects of policy changes on tax behavior.
University of Ontario Institute of Technology (UOIT) Medical Monitoring
UOIT, in conjunction with IBM, has undertaken Project Artemis, an effort to improve medical monitoring technology to allow it to detect warning indicators before vital signs reach critical levels—like nosocomial infection, which is life-threatening to premature infants and first presents as a pulse that is within acceptable limits but not varying as it should. Project Artemis is based on Streams analytic software, an information processing architecture that enables near real-time decision support through continuous analysis of streaming data.
TerraEchos Perimeter Intrusion Detection
TerraEchos specializes in technology designed to protect and monitor critical infrastructure. One of its clients is the U.S. Department of Energy Labs, which relies on it to protect its scientific intelligence, technology and resources. It needed a technology solution that would detect, classify, locate and track potential threats (mechanical and biological) —essentially distinguishing the sound of a whisper from that of the wind from miles away. To do so, the solution uses sensors, analytic software and high-performance computing to continuously consume and analyze massive amounts of information-in-motion, from human and animal movement to atmospheric conditions.
NASA Human Spaceflight Imagery Collection, Archival and Hosting
NASA's Johnson Space Center (JSC) is the hub for the U.S.'s astronaut corps and home to International Space Station (ISS) mission operations. Since 1959, it has collected more than 4 million still photographs, 9.5 million feet of 16mm film and 85,000 video tapes and files representing 81,616 hours of video in analog and digital formats. The collection is used for media content as well as by the scientific and engineering community. NASA has created an application called Imagery Online (IO) which links imagery file names to all of the meta data associated with it. But the agency still faces a big challenge in making the collection available to the public in both its raw, native form and transcoding it into smaller, more accessible...