Recent changes to Australia's Copyright Act will allow the National Library of Australia to broaden significantly its efforts to preserve the country's digital cultural heritage.
The Civil Law and Justice Legislation Amendment Bill 2014, passed in June, was an omnibus bill that included among its provisions a number of amendments to the Copyright Act 1968.
"It's a really significant event for the National Library of Australia and for Australian libraries in general," said Alison Dellit, the NLA's director, Australian Collection Management.
"What the amendment to that act has provided us with is the basis to collect electronic publications in the same way that the National Library collects print publications," Dellit said.
Legal deposit provisions already placed a requirement on Australian publishers to lodge a copy of each publication they release with the National Library.
"We've had a lack here of capacity to comprehensively collect electronic publications because we haven't had the amendments [to the Copyright Act]," Dellit said.
Over the last two decades there have been amendments to copyright laws in a number of international jurisdictions to enable national libraries to collect ebooks and website material in a similar fashion to print publications.
"Now the legislation's come through, it will enable the National Library to collect publications which are only published digitally," Dellit said.
"That includes large-scale website harvesting as well as the ebook imprints that are produced in Australia."
"Australia has quite a strong market for digital-only ebooks, particularly in the romance genre," Dellit said.
"That's material we haven't been able to represent through the collection effectively until now."
In addition to expanding legal deposit provisions to encompass ebooks, the amendment will allow the NLA to issue a request for electronic material that it considers belongs in its Australian collections, whether that material first appeared on a local website or one hosted overseas.
Image credit: National Library of Australia.
"Obviously the legislation only impacts on those people who are subject to Australian law, but it does mean that we are focussed on bringing things in to the collection that we think reflect the Australian experience, and we think the legislation gives us the ability to do that," Dellit said.
"What we want in the Australian collection, what we consider Australian here, is anything that was published within the terrain of Australia, anything that reflects on Australian life, and anything that was created by an Australian author even if they're resident overseas, and that's because we want to reflect the expat experience of Australians as well as the local experiences."
The legislation covering legal deposits requires publishers of what are dubbed 'physical format digital publications', such as publications issued on a CD, DVD or USB drive, to lodge within 30 days a copy with the library. That is in line with the requirements for print publications.
'Online' electronic material only needs to be delivered to the library on request.
"The nature of the legislation means that that request could be an automated request," Dellit says.
The library has had a web archiving program that's been in operation since 1995. ("It's a world of brightly coloured text and rotating GIFs", Dellit said.)
That harvesting has occurred on a voluntary basis through negotiations with publishers. The library will now be able to increase scope of web harvesting and therefore the material available to researchers.
"We'll now be in a position where we'll be able to go out and collect that material via a web harvest process," Dellit said
Collecting ebooks is more likely to take place through a process of negotiation, she added.
"We're developing systems that should work with publishers' systems to enable [publications] to come through in batches. There will also be a web form for people to upload an ebook that they've produced if we request it as well."
Read more: Labor promises to fund Trove
Preserving the Australian web
The library currently has three main web harvesting activities. The longest-running is the Pandora Archive that is managed by the National Library but run in collaboration with the state libraries and a number of other cultural institutions.
Pandora was set up in 1996. Its name is an acronym for 'Preserving and Accessing Networked Documentary Resources of Australia' and it is a selective archive with a particular focus on websites that relate to public policy.
The underlying infrastructure for Pandora was revised around five years after the archive was launched. The library is now in a process of shifting the archive to Heritrix, the Internet Arhive's open source website crawling software.
The second archive is the Australian Government Web Archive (AGWA), which was launched last year by the National Library and uses Heritrix.
That archive collects material published to .gov.au domains, excluding state government material that the library doesn't have permission to make available.
There library carries out four of five harvests a year for the AGWA, Dellit said.
"Because of the early nature of our program and our partnership with the Internet Archive, we've now got material from 1996 through 2015 in that archive," she added.
The third of the National Library's web harvesting activities is a whole of .au harvest.
"In 2004 we started doing a once a year snapshot of the whole of the .au domain," Dellit said.
It has so far been unable to make that material available to researchers. Due to Copyright Act restrictions the library may be unable to make that material available openly on the web.
However, Dellit said the NLA believes it will be able make it searchable, interrogatable and browseable and provide tools for people to perform data analysis on-site at the National Library.
The National Library of Australia is halfway through a project to replace the underlying infrastructure used to collect digital material such as ebooks and websites.
The NLA's Digital Library Infrastructure Replacement Project is due for completion in 2017, Dellit said.
"It involves the complete replacement of the underlying software behind our web harvesting as well as the technology that we'll need to bring in ebooks and in fact our storage, preservation and delivery systems for digital content," she said.
"We're expecting to have the collection elements of the infrastructure rolled out by the end of 2016," Dellit added.
The ability to collect and make accessible Australian ebooks should be available before the overall project is completed.