Decade old data formats an obstacle to information sharing

Scientific community seeking standards

Interoperability and competing data formats have been a serious obstacle to data sharing in the scientific community.

An interoperability wrapper is required to help earth scientists from the Solid Earth and Environment Grid (SEE Grid) community share data across systems.

Earth and environmental scientists have been stymied by the challenge of efficiently combining data from the wide variety of formats in use, some of which are decades old.

Speaking at the SEE Grid conference at the CSIRO Discovery Centre in Canberra yesterday, Andrew Woolf of the UK's Central Laboratory of the Research Councils (CCLRC) said XML Language (XLink) may be the solution.

XLink is a framework which allows elements which define the relationship between resources to be inserted into existing XML documents. It can be used to associate cross-format metadata with remote or local links.

The result is that users will be able to create a simple script which has the ability to read from a variety of file formats using the appropriate access mechanisms.

"The problem with traditional file-centered data management is it focuses on the file as the key artifact of interest." Woolf said.

"The file at the end of the day is a container for the stuff that's important."

But it is not feasible for earth scientists to switch to a unified file format, partly because the files may require specialized programs to read the data.

XLink combined with Geography Markup Language (GML) provides the opportunity to aggregate constructs across multiple files.

"One way of looking at it is that GML provides the conceptual feature skeleton, and the collection of different files provides the flesh" Woolf said.

He explained that the CCLRC is involved in a project known as the National Environmental Research Council (NERC) Data Grid.

The NERC grid attempts to compile and make available large amounts of environmental information from a variety of sources. It is overseen by the British Atmospheric Data Centre (BADC).

Much of this information is held in aging Legacy data storage systems. For example, the European Weather Centre contributes around 2 petabytes of file-based data to the grid.

XLink "enable[s] a powerful scalable mechanism" for access of this data, Woolfe said.

"We've got 40 terabytes of data at the BADC and we can make that conceptually interoperable with kilobytes of GML," he said.

The NERC makes the software required to access the grid publicly available for download.

SEE Grid is a scientific community of environmental and earth scientists from the CSIRO working to establish interoperability standards to facilitate ease of access to vast archives of scientific data.

Join the newsletter!

Error: Please check your email address.

More about CSIROCSIRO

Show Comments

Market Place