National archives prepares to go digital

With the unenviable responsibility for housing potentially petabytes of government information that's both public and 'private', the National Archives of Australia (NAA) in Canberra is set to launch its digital archiving service on new infrastructure.

Lola McKinnon, acting director of the National Archives' digital records projects and operations, said the rate of creation of digital information has spurred a set of "e-permanence" products and guidelines, which is the framework for developing the e-records management system and maintaining it.

"Digital records are subject to the same constraint as paper, which may be made available to the public," McKinnon told Computerworld. "But most records are kept 'private' for 30 years, which makes managing the two types of information a challenge."

With that objective, National Archives designed and built a digital records prototype system with in-house developed applications.

The infrastructure consists of two separate systems, one running Windows 2000 with 6.5TB of EMC CX500 disk and the other running Red Hat Enterprise 3.0 with 5.5TB of Apple Xserve storage.

David Pearson, NAA repository manager, said having two repositories with different technologies and vendors results in a "whole lot of redundancy" in the long term, but keeping them synchronized will be a challenge.

A new 12-rack capacity computer room was built in NAA's Mitchell building in Canberra for the project and came online in March this year. The room and hardware cost about $1.2 million. NAA will look at upgrading storage for this facility within six months.

The application software, developed in-house, was open sourced by the NAA: Xena, written in Java, which converts electronic records into a standard XML format, Digital Preservation Recorder to track the actions of record, and a Quest database for keeping information about objects in a repository.

Pearson said the project has become one of digital preservation.

"Traditionally in the paper world, we take about 5 percent of paper from [government departments and] agencies, but how much we will take in digital we don't know, it may be 5 or 10 percent," Pearson said, adding that since no agency tells the NAA how much digital data they'll submit, it makes capacity planning "quite frustrating".

McKinnon said the digital archive will eventually get to petabyte scale. Agencies have to recognize this, she said, and cited as an example one unnamed agency which has 35TB of e-mail alone.

"A number of agencies are keen; for a lot it's a change in mindset from paper to digital," she said. "That's something to address."

The NAA estimates the cost of digitizing its existing paper content would be "in the billions", but it will continue offering "digitization on demand".

"What we keep inside the digital archive is born digital that's been given to us for preservation," Pearson said, adding if the content is paper, it won't be stored in the new digital storage system unless there is a good reason.

"When we have multiple terabytes and get beyond prototype we would like to have off-site backups and may completely forget about any tape backups," he said, adding conservation is imperative.

"The record is king and everything else is secondary."

When NAA has a large quantity of digital records that are in the "open period" (available to the public), it will store this in a Web-based access repository managed by internal IT, not archives.

"All the paper in a building could be 5TB so we could store it on a couple of RAID boxes," Pearson said. "It's quite amazing when you look at the benefits and in the long run it will be cheaper to manage. From a risk perspective, it's a good thing."

From what's traditionally been a paper paradigm, McKinnon described going digital as "a brave new world for archiving".

More about AppleEMC CorporationParadigmPearsonRed Hat

Comments

Comments are now closed

No one group should govern Internet, says Turnbull

READ THIS ARTICLE
MORE IN Business
DO NOT SHOW THIS BOX AGAIN [ x ]
CIO
ARN
Techworld
CMO