John Parkinson, vice president and chief technologist at Cap Gemini Ernst & Young, provides an insight into how the latest trends in storage are changing the way enterprises manage their data.
What is driving the storage market today?John Parkinson: There are two factors coming around the storage issue. The first one is the supply factor. The technology is getting better and storage is getting radically cheaper on a cost-per-bit basis. And the units of storage are getting radically larger. If you look back five or six years, the whole of the western economy probably had one petabyte (1,024 terabytes; one terabyte equals 1 trillion bytes) of data in storage available. These days, half of that is what you need to develop a single oil field.
So there's been a huge push in terms of the available technology. And the amount of storage being shipped increases almost every quarter, even though the unit costs of storage have gone down so much that the people who make storage aren't making as much on each unit as they used to.
The second factor is the demand side. Ever since we discovered the Internet and the World Wide Web and electronic business, the amount of data available to be stored has grown geometrically. It's not just in industry verticals like energy exploration where better technology, better instrumentation and better mathematics has increased the demand for data to be stored and to be analyzed -- although that's a big factor -- it's that with 100 million people on the Web, every business needs to keep track of the people who shop on its websites. Their web log files have grown. We've gotten these two converging factors -- lots of cheap storage from the storage makers and lots of data to store from the Internet experience. That's why storage is an issue.
What happens as these two factors converge?The first thing it has done is made it quite apparent that traditional storage architectures, which you can think of as hanging disks off of servers, are a very expensive way to go because you need a lot of servers to hang the disks off. All of the storage architectures we have that are server-attached architectures limit the amount of disks you can put on any one server box, so server boxes started to proliferate. That's a very hard management problem. In the mid-'90s, we saw the development of new technologies for managing large amounts of storage. There's network-attached storage, which is basically disks attached by an Ethernet, and storage area networks, which are essentially sets of disks, which are attached via fiber loops. Both have their proponents, both have their place in a storage architecture, and they've become the dominant storage approach.
The second thing that happened, once you get out of server-attached storage architectures you also get out of people's comfort zones with relation to how you do data administration tasks like backup and recovery. And you can argue that because of this rapid modification of the amount of storage being deployed, backup and recovery strategies, which basically grew up in the days when storage was expensive and in small quantity, haven't kept pace with architectural demands of the amount of storage we have in corporations today.
What's in the future for back up?Backing things up isn't that hard. It's a challenge if you want continuous availability of the data because you can't just stop things, make a copy and start them again anymore, like in the batch days. So you have to develop technology that allows us to take snapshots of parts of large data sets, or to invest in creating pop mirrors so there's always two exact copies of the data available.
The harder problem is recovery because the closer we get to storing data on a real-time basis, the harder it is to find time to recreate the data set if we had to.
How much of this storage transformation is going on?What we know is that the amount of storage shipped in the two categories (that is server-attached and not server-attached) tipped in 2001 so that the majority, and even as much as 60 percent of the total shipped, went to not server-attached, so it was split between network-attached and storage attached architectures.
What people are spending money on is the network-attached stuff. However, there was a huge amount of server-attached storage in place when that happened, so it's really hard to know what the install balance is today. My guess is what's being installed, particularly in storage area networks, are really huge storage volumes -- hundreds of terabyte. The total, in terms of core storage in the enterprise, is probably now 50:50 or slightly in favor of network-attached storage and storage area networks. What's interesting however, is that the majority of the data in many enterprises is still on PC hard drives because there's so many of them. And it's not very well managed and it's not very well coordinated and it's not very well backed up. If you think in the enterprise, there's really three sets of storage -- there's aggregate of PC hard drives, there's server-attached storage in the network and there's network-attached storage of the two kinds we've talked about.
Is there a proper mix of storage options a business should attempt to achieve?In an ideal world, everything would be network-attached, so the stuff you had on your PC would be a nonpersistent mirror of enterprise data, and servers would be computing engines. They would have as much local storage as they needed to complete their computing tasks, but everything would eventually be stored in the network. If you do the math on the technologies we have today, compute cycles are becoming somewhat expensive in comparison to bandwidth and stored bits. It probably won't last forever, but that's roughly how the economics work out today. You would not want servers to be doing data management tasks. You want the storage networks to be doing that. And you'd have a lot of cheap bandwidth to tie everything together so moving the bits around would be more efficient. Storage connectivity seems to be the optimum architecture.