The warp and woof of data storage
- 31 January, 2001 11:00
We all know the future of computing storage. Well before Amazon.com Inc. common stock pays a dividend, there'll be essentially three kinds of computer storage. First, tiny holographic biogelled libraries the size of a dime will be able to hold your entire digitized life history under your skin, or the dashboard of your personal mover, if not your toaster. Really large projects will offload to storage-over-next-next-generation-Internet-Protocol. Backup will be as much of a utility as potable water, with pro-quality encryption and super-high availability assumed. Finally, there'll be a few magtape readers and writers, because ... well, just because there'll always be magtape.
There should be nothing startling about those predictions. All the technology already exists in the laboratory. We can expect it to appear on the market as soon as other parts of the system catch up; optical switching and scalable management services are the current bottlenecks.
Where are we now? Access speeds are slightly better than they were 20 years ago, reliability has skyrocketed, vendors are more creative in their subterfuge about true product performance, and price per bit is on its way to zero. At the retail level, where I type these words and you read them, we're launching applications from internal buses such as IDE, or often SCSI, despite the headaches both of those black arts have caused for over a decade. Individual hard drives are so inexpensive that consumers habitually ask for "the biggest one" with little concern for sticker price. The big questions have to do, as in so many areas of computing, with manageability, availability, and scalability.
So what mass-storage technologies will we be using in five years? We'll see that there's still plenty of work to do before we arrive in the science-fiction worlds that swim in data. This installment of Future Computing aims mostly at the middle ground of tracking the market expansion of several technologies that, in principle, are already available.
The major transition that datacenters are currently undergoing is the loss of the ability to say, "This hard disk is attached to that computer." Personal computer storage topologies have been simple: controllers plug into the main bus and manage hard disks. SCSI "fans out" the complexity slightly. That's hardware jargon that recognizes that various SCSI standards allow a small number of drives to connect to one host, and one host -- or, in unusual cases, two -- to connect to a single drive.
Now processing power and mass storage increasingly need many-to-many interconnections, or a fabric of computing and storage. Advanced SCSI standards supply limited sharing -- so, for example, a hot backup can access live data for failover.
Ways to weave a storage fabric
Network-attached storage (NAS) is somewhat more scalable. Storage appliances that serve up network file system (NFS) mounts over TCP/IP connections are an example of NAS. NAS can be physically distributed over much greater distances than SCSI, for better mechanical engineering and disaster prevention.
However, NAS lacks standards for replication and large-scale management. In principle, storage-area networks (SANs) supply some of the requirements. SANs are IP-addressable storage resources, typically wired with Fibre Channel.
Although vendors have been hawking them for almost 10 years, SANs aren't mature yet. A lot of "craft" work is still necessary to assemble a working SAN; elements from different manufacturers frequently don't interoperate. Too many sites are finding that they've bought increased fragility, not uptime, with their fancy advanced storage architectures. Moreover, Fibre Channel remains expensive.
Miniaturization of mass storage seems to be a "democratic" technology, like Linux or Apache. Low-level managers have quickly seen the advantages of each new generation of less expensive hard drive. Economies of production and use kick in quickly as local administrators and hobbyists choose cost-effective solutions for local problems.
Networked storage, though, is more like a high-end content management system or elaborate class library, selected at a high organizational level and with a long payback period. Once datacenter specialists work out techniques for daily operation - sometime in the next few years -- Fibre Channel and its friends will spread to more casual sites, and prices will drop.
Standards battle over SAN engines
Two new standards, with their own arcane labels, should proliferate during the next year. It's unclear whether they'll provide healthy competition for NAS and SAN, or just complicate purchasing decisions. Cisco is working with IBM and several smaller companies on "SCSI-over-IP" (sometimes called "IP storage"), a way to build SANs with familiar SCSI block protocols.
Meanwhile, Intel and the usual server vendors -- Sun, HP, Compaq, and IBM -- are promoting InfiniBand as a replacement for the PCI family of bus standards. Infiniband complements SCSI-over-IP on a technical level, because it specifies the physical layer. It's no secret, though, that Intel wants to see InfiniBand become so successful in implementation of interconnects that it eliminates the benefits of SCSI-over-IP.
A few other encapsulations from specialized startups have comparable technical merit. When Cisco and Intel collide, though, the smaller players will need to retreat to the sidelines for safety. Sometimes during the next eighteen months, IP storage or InfiniBand, or both, will break through and become known as a "safe" buy. Instead of an expensive and rather touchy purchase, Fibre Channel will be an easy decision.
One theme Future Computing will examine all year is the "management problem." Many technologies appear to be ready "in principle," but lack the management tools they need to be practical. That appears to be SAN's predicament: the idea has plenty of merit and the pieces work, but repairing it requires too much expertise. Several of the interviews lined up for the ITworld.com Interviews Forum during the next few months address that need.
So, what are your projects and prospects in this area? How and why does networked storage matter in your future computing?