It's been four years since IBM announced its Storage Tank platform, a SAN-based file system designed to enable multiple application servers to access and share files in a single name space across multiple disk arrays. The company's forthcoming line of StorageTank products -- which is expected to include a global file system, storage resource management and hardware virtualization or disk array pooling -- will launch in 2003 and will include policy-based automation. In a recent interview with Computerworld, Mike Zisman, IBM's vice president of corporate strategy, said the first product will be a block virtualization appliance that launches soon, with file virtualization coming along in the latter half of the year.
IBM is also working with CERN, the European Organization for Nuclear Research, to build the world's largest data grid. IBM's storage virtualization software will allow allow 10 petabytes of data to flow across the grid, enabling scientists to digitally re-create the initial moments of the universe's formation and help understand fundamental questions about the nature of matter.
Q: What's your approach to virtualization?
We'll be talking about the three products: ... virtualization at the file level, virtualization at the block level and then the standards-based tools to manage all that. You can much more easily manage a pool of storage. We think that will get even more important in the future as another area of the industry begins to develop, which is things like ATA-based drives. So you have different types of storage in your network: Very high performance storage for transaction processing and very large ... ATA-disk based systems for reference data. That stuff will be shipping in the very near future.
Most people who deal with storage don't deal with data at the block level. They deal with it at the file level. So applications deal with files and, more often than not, database systems are layered on top of file systems. So, there's a need to virtualize at the file level. You want to virtualize across different operating systems ... and provide a file system that makes sense in a SAN [storage-area network] environment where you have very high Fibre Channel access to lots of different storage systems. So Storage Tank is very much about allowing an application to access files through a single interface independent of where those files are.
Q: So where does it go after that?
We've indicated that's the beginning and from there we move to a federated file system, which is a much more geographically distributed file system. ... That's one of the areas we're working with CERN to extend it -- in the area of grids. We want to be independent of on-demand.
Q: So are we talking about a form of object-based storage?
Today, the focus of Storage Tank, in terms of what we're going to be delivering this year, is [a] SAN-based file system. The primary value-add we're providing is a metadata server, and that's where we keep all the knowledge about where the files are. The advantage of the Storage Tank approach is when an application server says, "I want to access a file," it goes to the metadata server unopened, and if that person wants exclusive access, to have access control to it, the metadata server gives back to the application server all the information it needs to access the file and then gets out of the way. [That] is a different approach than NAS [network-attached storage], where, as you know, every I/O goes through the NAS system itself, which has its scalability issue.
I think object store will develop over the next several years in storage systems themselves. Grid protocols will mature to the point where they're very clear protocols for distributed object access. Obviously, products like Storage Tank and in other federated or global files systems that people are working on will take advantage of that.
Q: So when will these products be available?
I think what we've said is that the first products to ship [will be] block virtualization products ... and Storage Tank will be available soon after that. It's reasonable to conclude that the block level product is a first-half [of 2003] offering and StorageTank is a second half offering.
Q: So will you use API swaps to work with competitor platforms or work through standards?
Our view has always been that the only long-term solution to these kinds of problems [is to use] industry standards. Swapping APIs is a useful tactical solution on the way to that strategy, but you can't lose sight of that strategy.
Q: What's held up the launch of StorageTank? And what's happened in its development up to this point?
It's a very complex development effort for two reasons: StorageTank only has value when it supports many different types of application servers. If StorageTank says, "I'm a file system for Unix or a file system for Windows or HP-UX," that wouldn't be of value. The customer pain we're trying to address is the multiple application servers that they have in their environment for a good reason. Instead of having to manage separate file systems in each of their servers, they'd like to have one.
In one customer location, a financial services firm, they have 10,000 application servers. They have 8,000 Unix servers and 2,000 Windows servers. That means today he has 10,000 file systems. Each of those application servers may share physical storage on a SAN, but they have separate file systems, which means they each have to be backed up and managed. Do I want 10,000 file systems or do I want to abstract that out, just like we do in storage today, and have one single files system?
Q: But HP's been shipping a virtualization product for more than a year.
HP has no products similar to StorageTank. They have block virtualization products. They got VersaStor from Compaq. At the highly SAN-driven storage system, none of the other major vendors has a product that they've announced. The multivendor approach makes that very difficult.
There are numerous block virtualization products out there, but for the major vendors -- HP, EMC, Hitachi -- ...at the file virtualization level, none [has] expressed an interest.
Q: When we talk standards are we talking about SMI-S -- the Storage Management Initiative Standard from the Storage Networking Industry Association?
Yes. Obviously, in the early days, the proprietary [management software] is more functional because it's been around longer, so you have to augment these things and drive toward complete standards.
We also recognize that tactically we need to do things, so we have announced API swaps with Hitachi [Data Systems Inc.] and announced some things with Hewlett-Packard. But at the end of the day, where we spend the bulk of our energy is focused on standards.