The University of California at Berkeley is home to a storage project that has a breathtaking vision. OceanStore provides a consistent, highly-available, and durable storage utility atop an infrastructure comprised of untrusted servers.
The project has been underway for some years and has an infrastructure of research studies and papers and downloadable code.
The project overview says that any computer can join the infrastructure, contributing storage or providing local user access in exchange for economic compensation. Users need only subscribe to a single OceanStore service provider, although they may consume storage and bandwidth from many different providers. The providers automatically buy and sell capacity and coverage among themselves, transparently to the users. The utility model thus combines the resources from federated systems to provide a quality of service higher than that achievable by any single company.
OceanStore caches data promiscuously; any server may create a local replica of any data object. These local replicas provide faster access and robustness to network partitions. They also reduce network congestion by localizing access traffic.
We must assume that any server in the infrastructure may crash, leak information, or become compromised. Promiscuous caching therefore requires redundancy and cryptographic techniques to protect the data from the servers upon which it resides.
OceanStore employs a Byzantine-fault tolerant commit protocol to provide strong consistency across replicas. The OceanStore API also allows applications to weaken their consistency restrictions in exchange for higher performance and availability.
A version-based archival storage system provides durability which exceeds today's best by orders of magnitude. OceanStore stores each version of a data object in a permanent, read-only form, which is encoded with an erasure code and spread over hundreds or thousands of servers. A small subset of the encoded fragments are sufficient to reconstruct the archived object; only a global-scale disaster could disable enough machines to destroy the archived object.
The OceanStore introspection layer adapts the system to improve performance and fault tolerance. Internal event monitors collect and analyse information such as usage patterns, network activity, and resource availability. OceanStore can then adapt to regional outages and denial of service attacks, pro-actively migrate data towards areas of use and maintain sufficiently high levels of data redundancy.
Many components of OceanStore are already functioning in isolation. A complete prototype is currently under development.
PowerPoint presentations and many technical papers are available for inspection from the project's URL. A key aspect of it is global-scale persistent storage with local access. This will be achieved through:
- A “planetary-scale” information utilities infrastructure which is transparent and always active
- Extensive use of redundancy of hardware and data devices that negotiate their interfaces automatically
- Elements that tune, repair, and maintain themselves. (Shades of IBM's autonomic computing idea and HP's adaptive infrastructure here)
The project's concept involves ideas such as:-
- Computing everywhere: Desktop; Laptop; Palmtop; Cars; Cellphones; Shoes? Clothing? Walls?
- Services provided by interior of network with incredibly thin clients on the leaves
- Mobile society: people move and devices are disposable
This might strike you as a collection of glib ideas, it could be a Sun presentation. This glibness continues with ideas such as the 20th-century tie between location and content being outdated; wide scale disaster recovery; transparent computing being the ultimate goal with computers disappearing into the background.
In the storage context we:-
- Don’t want to worry about backup
- Don’t want to worry about obsolescence
- Need lots of resources to make data secure and highly available, but don’t want to own them
- Outsourcing of storage is already becoming popular (oh, yes?)
- Pay monthly fee and our “data is out there”
The scope and scale of the concepts involved could come out of a science fiction book, but Berkeley has produced the BSD Unix distribution and does have academic street cred.
The OceanStore concept is staggering in scope, both staggering in the breadth and depth of its ideas, and also as a concept that is wobbling from side to side because of the impracticality and unreality of it.
There are research papers galore with wall-to-wall academic organisation credentials. It is hard to know how serious this project is, or how influential. Certainly it hits many buttons but the idea of setting up a global storage utility infrastructure much like the Internet, while superficially attractive, is fraught with obvious implementation difficulties. Perhaps we need the US DOD to have a DARPA-like pilot project on which a working infrastructure could be developed. Any such effort is a decade or more away from fruition.
OceanStore might be a straw in the wind but it is confirmation of the way the storage wind is blowing stronger and stronger.