Hey says a "fourth paradigm" in science is emerging. For thousands of years, we have had experimental science, he says. Since Newton, we have had theoretical science, by which experimental results can be predicted by equations. Then, in the second half of the 20th century, we added simulation science, enabled by fancier equations and supercomputers. Now, Hey says, we are entering the era of "data-centric science."
The essence of data-centric science is to aggregate data, often in large quantities and from multiple sources, and then mine it for insights that would never emerge from manual inspection or from analysis of any one data source. He cites as an example a project called Galaxy Zoo, in which the public was invited to help classify millions of galaxies as either spiral or elliptical based on a million detailed images posted online by the Sloan Digital Sky Survey.
The work behind Galaxy Zoo is simple, boring even, and the goal was just to establish a large-scale inventory that would help scientists derive theories about how galaxies evolve. But a year ago, a strange and wondrous thing happened. A high school teacher and Galaxy Zoo volunteer in the Netherlands discovered what would become known as Hanny's Voorwerp, an enigmatic object of a type never seen before. No one is sure just what the distant green cloud is -- perhaps an extremely rare type of quasar -- and it is now getting intense scrutiny from astronomers.
Mountains of Data
Roger Barga is a researcher at MSR who is developing tools for e-science, which he calls "in silico science -- science done inside the computer." He says two technological developments are driving e-science. "The first is that our ability to capture data -- through bigger machines, bigger colliders, more sensors and so on -- is outpacing our ability to analyze it by conventional means."
The second is the emergence of new tools for pattern recognition and machine learning -- algorithms that improve over time as they deal with more and more data, without human programming -- and other new ways to organize, access and mine vast amounts of data. For the Neptune ocean observatory, MSR is building a "scientific workflow workbench" on top of Microsoft Windows Workflow, to save, systematize and catalog all the data. It will help scientists visualize oceanographic data in real time and compose and conduct experiments.