The surly Irish playwright George Bernard Shaw once observed, "It's just as unpleasant to get more than you bargain for as to get less." He must have been thinking about mass storage systems.
At all levels -- disk, array, NAS and SAN -- data storage devices are expanding their capacities faster than most of us can keep pace. Today, business laptops from Dell Inc. come equipped with 40GB drives as standard. In 1999, the regular configuration averaged 4.8GB.
Since 1990, disk drive storage capacities have outpaced Moore's Law, doubling every year instead of the 18 months that it takes for chips. Currently, IBM Corp.'s researchers have pegged 20TB as today's theoretical limit of a single drive's storage limits. But many others, as noted below, believe that goal will easily be smashed.
Prices are tumbling, too. According to American Scientist, "At a few tenths of a cent per megabyte," digital data storage has become much cheaper than paper as a medium on which to create and hold information.
So cheap, in fact, that it's likely more of us will follow Microsoft Corp. researcher Gordon Bell's lead by cramming our entire lives -- everything from the books we read to every e-mail we send or receive -- onto a single, portable disk drive. He estimates in his MyLifeBits project that a person's life that was filmed 24 hours a day for 100 years could slip into a single 100TB drive, which is something he and others believe we have "every prospect of reaching."
Although I'm certain Bell's life is worth chronicling, most of you will be storing corporate, not personal, histories in enormous detail, not only because it's possible and cheap, but also because the government mandates that you save more and more company data. Uncle Sam also wants you to access it on demand. But that could be a problem.
Setting aside the database and data management issues for dealing with the humongous data stores we'll face, consider the simple matter of physical access to the stored data. According to Jim Gray, another noted Microsoft researcher, while drive capacity has been doubling for nearly 15 years, physical access rates have improved by a mere 10 percent annually during the same period. As a result, I/O is once again becoming a bottleneck for application rollouts that depend on using larger and larger data stores.
Luckily, you can design your systems to bridge the gap until access times catch up to the data stored, assuming it's even possible. One way is to use more memory in application servers that are retrieving data from these massive, but lollygagging, drives. RAM is relatively cheap, so in many cases it will be the right solution. Happily, 64-bit operating systems and chip sets will become more widespread in the coming years, giving us headroom for cache expansion beyond the 32-bit system limits most of us experience today.
This approach also lets companies run big iron that can continue to serve lots of users with vital applications. It makes for a more secure computing environment and a data center that's easier and less costly to manage.
You could also solve your I/O logjam by distributing today's centrally managed applications farther to the edge of your network or even down to the desktops themselves, probably via Web services. This technique improves performance because data reads and occasional writes are happening closer to users, sometimes literally on their laps. However, it also involves more complex security, data management and replication issues.
Endless storage capacity coupled with finite access also wreaks havoc with backup and retention. To fix the chaos, in the not-too-distant future you'll switch from removable storage to real-time disk-to-disk backups. Instead of sending boxes of tapes to off-site storage sites, you'll send small, prepackaged and inexpensive fixed-disk appliances.
The real problem, of course, will be predicting when and where on your network those problems will crop up. Anything to do with multimedia is clearly a potential trouble spot. It takes 1 million 1MB documents to consume 1TB of storage, but only 290 hours of 1.5Mbit/sec. streaming video.
And once you determine which applications need special attention to bypass the I/O bottleneck, you'll have to figure out which approach is best to solve it.
It's generally pleasant to have all that capacity, albeit on relatively slow disks. But as Shaw said, getting more than you bargain for has its downside as well.
Mark Hall is a Computerworld editor at large. Contact him at firstname.lastname@example.org.