Unis to build distributed storage grid
- 25 July, 2006 12:40
Researchers from four universities have started work on the Australian Research Council (ARC) funded data grid storage infrastructure for e-research (DSIeR) project to help facilitate data accessibility.
The DSIeR Project is a collaboration between the University of Queensland, Australian National University (ANU), Monash University, and James Cook University. The project initially received $800,000 in funding to "provision and explore" issues surrounding a data storage backbone physically distributed between Townsville, Brisbane, Canberra, and Clayton in Victoria.
In a statement announcing the project, ARC believes while the underlying computing and network requirements are well resourced, data storage capacity has been "relatively neglected" in Australia.
"A continuing problem in Australian research communities is the absence of coordinated digital storage resources [and] computational and experimental data are [often] stored on ad hoc resources [like] local university servers, PC disc drives, CDs, or DVDs, and are not generally accessible," the statement read. "The central goal of this proposal is to provide a long-term, integrated scientific data storage capacity for Australia."
The Monash component of the project recently commissioned 60TB disk capacity plus and plans to add 160TB tape robot storage later this year. At the ANU, the project is in conjunction with the Australian Partnership for Advanced Computing (APAC).
APAC's information infrastructure coordinator, Dr Ben Evans said the project will look at better ways to manage and coordinate data for accessibility, high speed transfers, and provide opportunities for further research and analysis.
This will be achieved by providing services and methods for data exchange and incorporating workflow practises like grid-enabled applications. The group will use hierarchical storage management (HSM) systems with disk and tape components in what Evans describes as a "large managed migrating filesystem".
"When data is written to the host with HSM capabilities, data is written to a disk cache, and then migrated to other slower media based on policy settings," he said. "In ANU's case this capability is being combined with data intensive computational needs, so we are building infrastructure that will allow the data to migrate between different speeds of global filesystems, high-speed disk, low-speed disk, and tape as needed by the project."
Other services like relational databases, Web technologies, and data analysis software are layered over the infrastructure.
Evans said the total capacity of the storage grid is a moving target that increases two to three times each year in raw capacity. The ANU's system currently has nearline capacity of 1.2 petabytes, with on-line disk increasing to over 100TB by the end of the year.
"There are multiple levels of development - from the grid technology level to all sorts of data management apps, including workflow tools," he said. "Another part of this project is special Web development work to provide portal access to people to get in and view their data. A lot of emerging data services are being developed as part this."
Evans said the data grid services are reaching an operational state and the advanced level services will continue to be developed as new research communities take advantage of APAC's services.