Storing and maintaining the mountains of scientific data Australia’s Antarctic researchers generate is not a job for the faint-hearted. But keeping tabs on each and every piece of information while making it accessible to the public is the challenge the Australian Antarctic Division (AAD) Data Centre’s database management team has undertaken.
The aim of the data centre division is to fulfil the AAD’s obligation to the Australian Antarctic Treaty, which specifies that all scientific information collected from the region be made publicly available.
Underpinning this need is an online central repository database system for all its collective information. Based on Oracle’s database suite, the repository sits on a Sun Microsystems server, running on Solaris 8. In all, the database holds about 0.5TB of data, all stored in-house.
AAD data centre manager Lee Belbin said that, when the central database system was being set up some years ago, an Oracle database was selected on three key factors: stability, usability and interoperability. It was at the time the only product on the market with an effective Web-based design, he said. At that stage, according to AAD applications programmer/database management Kirk Mower, users had to connect to each AAD database separately via a Telnet connection and text-based interface.
The team's decision to base selection on stability and interoperability has borne fruit and there are now some 30 databases covering a plethora of scientific and geographic subjects, including flora and fauna species, marine science, biodiversity, weather and geographic information systems. While most of the data is from Australian researchers or expeditions, a small proportion comes from international sources.
With the central database, almost all of the information is accessible via the Web, Mower said.
“This means that remote users can query data, as well as international users.”
Belbin said this was particularly important in providing real-time, 24/7 access to AAD’s international users.
In addition, other top international data centres use Oracle, he said, and being able to communicate and collaborate with them was a key reason for choosing the proprietary database over freeware solutions.
The data centre’s database strategy has four parts: metadata, data, linkages and analysis, Belbin said.
The first, metadata, refers to the raw data submitted by Antarctic expeditioners and scientists, also known by the data centre as “datasets”. Initially metadata is recorded in Microsoft Excel or Access format and posted online as independent pieces of information. The collective metadata however, is catalogued much like books are in a library.
By posting this raw data online, the team provides quicker access for users to information, Belbin said.
The data centre team then performs regular reviews of these datasets to see whether some of it is “linkable”.
“If we have 10 or more datasets dealing with something similar, we will pull them into one database,” Belbin said.
The team has also developed linkages to cross reference and query data across all its databases using mostly ColdFusion Web application language.
Belbin said more recent technological innovations to the centre’s central database system have included moving its entire front-end database requirements on to a ColdFusion platform. The back-end of its database system remains on Oracle.
One of the key reasons for the transition across to ColdFusion was its quicker development time, Belbin said.
“It [ColdFusion] is faster than PLSQL, which is what Oracle uses, and is faster and more robust than Java,” he said.
“Java still has problems – it’s a bit flaky. We have used Tom Cat on the Java/Web component, but it falls over regularly, so [now] we mainly only use it for the graphics component.”
The newer version of ColdFusion MX also now includes Java support, allowing the team to write applications using the ColdFusion programming language but have them translated into Java, he said.
But while the database system appears well established, Belbin said his team continues to try and improve its usability and usefulness. This has prompted the data centre’s latest major initiative: data mining.
Using a statistical software solution such as S+, Statistica or Interactive Data Language (IDL) alongside its database computing power, AAD’s data centre has been able to research and release several recent papers dealing with data not previously linked together or compared.
To cite an example, Belbin said catchment data recorded from whaling expeditions undertaken in the previous century has recently been used to gauge the amount of change occurring across ice sheets off the coast of Antarctica. By being able to compare where whales were caught 100 years ago to whale migration patterns today, as well as current geographical information on the area, the researcher could make comparative studies on the amount of climactic change in Antarctica.
“Those fisherman recording their catchment information would have had no idea that they could contribute to studying climactic change,” he said.
Belbin said the next step for the data centre would be to publish some of its analysis on its Web site.
In addition, the team is currently working on both a new GIS and field trip database system. According to Mower, the field trip database will allow expeditioners to post information about the condition of the various base stations while onsite. This will include data on repairs or clean-up requirements, and even an inventory on “toilet paper” stock.
“Then we can track the human footprint in Antarctica,” he said.