Virtualization is an abstraction of physical storage. It masks the complexity of underlying networked storage by building a logical view of storage that is isolated from physical devices.
Virtualization software collects data from different types of devices-storage-area network, network-attached and server- or direct-attached-and gathers it into a common pool that can be managed, monitored and administered from a single console.
That sounds great in theory, but how close to reality is true storage virtualization? Today, the much-hyped technology only partially realizes its ambitious goal of unifying different storage devices.
Different vendors are approaching virtualization in different ways: Some implement virtualization on only their storage devices; others virtualize a variety of devices.
But none give users total virtualization-the ability to group all storage devices and hosts under a scalable and open virtualization engine.
"We've got to the point now where if there's a nail, you hammer it,"says Jamie Gruener, an analyst with The Yankee Group. "Everyone says they have virtualization, but what does it mean?"
Storage virtualization is implemented in three ways: on the host computer or server, on an appliance, or on the storage array. Within those classifications, vendors provide symmetrical or in-band virtualization, which is in the data path, and asymmetric virtualization or out-of-band virtualization, which is outside the data path.
In in-band implementations, a device sits in the path of data between the server and the storage devices and passes data and intelligence through to arrays attached to it. In out-of-band implementations, data passing between the server, switch or router to the storage devices is managed by the server or array.
For the past year, nearly every storage and systems vendor has touted a form of storage virtualization. Many have created virtualization software of their own; others have adopted software from other vendors; and some still are working diligently on their virtualization plans, hoping to bring out products later this year.
Virtualization is a market every vendor wants to get their hooks into because of its promise of making access to data easier and simpler to administer.
Virtualization of SAN or NAS devices is most common; some virtualization software claims to throw both into the pool at once. And vendors most often take an approach that depends on the type of software or hardware they manufacture. For instance, EMC and Network Appliance say they each virtualize the disks that reside in their storage arrays, but not across product lines.
Of the seven largest storage system and storage vendors -- Hewlett-Packard Co., Compaq Computer Corp., EMC Corp., Sun Microsystems Inc., Network Appliance Inc., IBM Corp. and Hitachi Data Systems Ltd. -- only a few have completely spelled out their virtualization strategies.
-- HP offers virtualization at all three levels. The company, which had server and array-based virtualization with its OpenView Storage Allocator and Virtual Array products, acquired start-up StorageApps last year and now offers appliance-based virtualization called SANlink.
-- Compaq, as part of its Enterprise Network Storage Architecture 2, plans to make VersaStor appliances that compete with HP's SANlink.
-- EMC, known for its powerful enterprise storage hardware, only employs array-based virtualization. Although with its Automated Information Storage strategy, the company is headed toward array- and server-based virtualization that will automatically and dynamically move data within the pool of storage.
-- Sun last month unveiled a virtualization array, the StorEdge 6900, which uses software from Vicom.
-- Network Appliance last month signed an agreement with storage start-up NuView to pool Common Information File System storage, the type of data that runs on Windows NT/2000 networks. The company declined to comment on other plans.
-- IBM offers a future vision of virtualization called Storage Tank, as well as array-based virtualization and appliance technology it obtains from DataCore, a start-up virtualization vendor.
-- Only Hitachi would not disclose its virtualization plans.
Beyond the Big Seven, a number of start-ups have come out with virtualization software. Three of the most successful -- DataCore Software Corp., FalconStor Software Inc. and Vicom Systems Inc. -- offer virtualization software that is installed on industry-standard Intel servers, and sold to systems and hardware vendors for redistribution.
IBM and Fujitsu-Softek offer DataCore's SANsymphony; StorageTek and MTI use FalconStor's IPstor software, while Sun uses Vicom's Storage Virtualization Engine in its new StorEdge 6900.
A variety of other established storage vendors and a few start-ups such as TrueSAN Networks Inc. and LeftHand Networks Inc. also offer virtualization software.
Storage virtualization is hot territory. It promises to make the management and acquisition of storage simple and easy for IT, letting users shift storage around within the pool where they need it, while maximizing their investment.
Visions of virtualization
The ability to pool all your storage into one virtual view is inviting, but vendors and analysts alike recommend that you slowly get acclimated before jumping in.
Virtualization, which can be deployed in the server, network or storage array, is still an emerging technology with a meaning that changes depending on which vendor you talk to.
Some vendors only pool data residing on their disk drives; others will pool any device's data; and yet others pool data and offer applications - such as mirroring, data replication and snapshot backups - that analysts say are as important to storage management operations as the virtualization of data itself.
While interest in virtualization is high among end users, even the boldest are still in the early pilot phase. Nonetheless, in spite of the vagaries surrounding virtualization, there are issues IT managers should be aware of when deciding to virtualize data.
-- Server/array-based virtualization: First is not always best.
Server- and array-based virtualization, in which the software and data-pooling intelligence reside on the server or storage array, were the first virtualization attempts. Vendors with a proprietary interest in servers or storage arrays manufactured virtualization software that ran on each of these devices.
For example, EMC virtualizes data across the drives of individual Symmetrix arrays.
Because array- or server-based virtualization doesn't put additional devices in the direct path of the data, it scales better than network-based virtualization, says Jamie Gruener, an analyst with The Yankee Group.
And deploying virtualization on the host server doesn't burden the other devices in the network, such as Fibre Channel switches or storage arrays. Of course, this method puts an extra burden on the server that has to process the extra virtualization tasks, so it could cause server-based latency.
Vendors argue that array-based virtualization lets them fine-tune all the virtualization capabilities because they are already familiar with the inner workings of the array.
But Steve Duplessie, an analyst with Enterprise Storage Group Inc., says he "doesn't see any advantages [to array-based virtualization] other than we expect RAID boxes to be smart appliances, and that should continue."
-- Network-based virtualization: Lots of interest, but watch out for bottlenecks and latency woes.
By far the most vendors seem to be lining up behind in-band virtualization software that sits on an industry-standard Intel server running Windows NT/2000 or Linux.
Network-based virtualization suffers from the same latency problem that server-based virtualization does: It puts a burden on the other network servers to always have to look to the virtualization server for information on where their data is. And it exacts a certain performance penalty on the server performing the virtualization.
Analysts also have concerns about the hardiness of the server deployed in network-based virtualization. "The knock against current in-band [virtualization] methodologies is that at some point they will become a bottleneck because they reside on Intel servers," Duplessie says.
He adds that while vendors such as DataCore and FalconStor take various approaches such as caching to mitigate potential latency problems, the point remains that the server the virtualization software is deployed on is often less powerful than the servers managing data and I/O.
"At the end of the day, there is an NT box [whose bus structure is not optimized for I/O performance] in the middle of the road," Duplessie says. "Large companies are very leery about NT being the core of their enterprise virtualization infrastructure."
If latency is a concern, Wayne Lam, vice president of engineering for FalconStor, suggests users try his company's IPStor product, which runs on Sun Solaris as well as Intel servers.
Another concern is that the typical Intel-based server's I/O is not particularly well-suited to configurations where snapshot backups, data replication or caching take place, analysts say.
Compaq and Sun say their upcoming virtualization technologies will overcome these issues. The two companies will introduce hybrid network virtualization devices that exist outside the data path and do not affect the transfer of data between host servers and storage devices.
Duplessie agrees that out-of-band virtualization schemes scale better than in-band. "There's never a scale problem; therefore large distributed enterprises have less of an issue with this approach. However, the downside is putting software on each and every host in the storage network," he says.
An advantage of in-band appliance-oriented virtualization is just that - users need no code on the host servers and because every I/O request and response passes through the virtualization engine, nothing else is required. Appliance-oriented installations are simple and easy to maintain, analysts say.
-- What's next: specialized virtualization switches.
Ask analysts where they think virtualization needs to go to benefit users, and they express a common theme.
"The concept of storage abstraction, or virtualization, is here to stay and that users will find more benefit than problems in today's world," Duplessie says.
"While the choice you make today may not ultimately be the best one, you have to start somewhere," he says. "The ability to manage disparate devices under a unified virtualization schema outweighs the fact that you may move to a different architecture down the road."
Duplessie looks forward to a new brand of purpose-built switches from vendors such as Pirus Networks and Maranti Networks that feature virtualization running at wire speed with no latency.
And Gruener says users considering virtualization software shouldn't consider any scheme that doesn't support a high-availability configuration or additional components that protect the storage network.
"On top of it you will have all sorts of services that are woven into the virtualization tool set - mirroring, capacity on demand, snapshot backup, data replication," Gruener says.
"The biggest challenge right now is that virtualization as a whole isn't necessarily going to be a separate feature longer term. It's going to be a component of a larger management package," he adds.
Plunging into virtualization
Bill Manning knew he had a problem keeping up with the storage demands of his network. Manning, associate director of technical services for the Plumbers & Pipefitters National Pension Fund, wanted to expand storage without facing the challenges associated with traditional RAID migrations. He accomplished that by rolling out virtualization software from DataCore that groups all of his storage into a common pool that can be managed from a single console.
Manning administers more than 1.5 terabytes of employment, retirement, pension, financial and eligibility data for 250,000 members of the US$4.5 billion fund in Alexandria, Va. The organization's storage needs have quadrupled during the past four years because of a conversion of paper documents to digital images. Each time Manning expanded his storage, he had to do it with identically sized disk drives. If he had 9G-byte drives installed, he'd have to find 9G-byte drives even though the industry had moved on to 18G-, 36G- and 72G-byte drives.
"Every 15 to 18 months I was going to be rebuilding RAID arrays, which was an expensive and time-consuming process that caused downtime," Manning says. "As the volumes of data become larger, the time to restore them from tape will get proportionately longer. At some time we would be faced with having the company down for four to five days."
He anticipated even greater storage manageability headaches when the fund began moving more paper files to digital format. What's more, two of his Windows NT servers were running out of space, while two other servers only utilized 50 percent of their disk space.
Realizing he could save more than $300,000 by not having to rebuild the two overcrowded RAID arrays, Manning put a stop to buying RAID in a willy-nilly fashion. About a year and a half ago, he implemented a storage-area network (SAN) using Gadzoox Capellix Fibre Channel switches and DataCore SANsymphony virtualization software.
The rollout cost $500,000, but saves time and trouble whenever IT needs to expand storage. "I can buy whatever disk is available and reconfigure the array on the SAN without taking the network down," he says.
Manning uses SANsymphony to dynamically allocate the pool of storage wherever it's needed. He monitors and administers the virtualization process from a single Web-based console, greatly reducing the complexity of managing his network. This also allows his staff to spend more time addressing other problems.
Installed on industry-standard Intel servers called Storage Domain Servers, SANsymphony supports Windows NT/2000, Unix, NetWare, Macintosh and Linux operating systems. The fund has two mirrored Storage Domain Servers for fault-tolerance. If one fails, the other takes over.
IT also has set thresholds that warn staff when data capacity reaches 60 percent utilization. When a threshold is exceeded, SANsymphony sends an e-mail warning or pager message to IT workers so they can plan and budget for more disk drives. "By virtualizing the storage space,I can better plan now how storage will affect my budget operations because I can see how that space is going to be used," Manning says.
When new disks arrive, IT plugs them into the cabinets, and formats and initializes them. The DataCore software recognizes the additional space and queries the administrator on how the space should be apportioned. Manning's group normally adds space over the weekend when it won't disrupt the workflow.
"[SANsymphony] lets us drag and drop storage from one server to another," Manning says. "It makes moving storage really easy."
Assisted by Selenetix, a systems integrator and reseller, Manning set out to deploy a SAN and virtualization software over a weekend. Accustomed to week-long upgrades to the RAID system, Manning was surprised that everything went as smoothly as it did. Users see data no differently than before.
Next up on Manning's to-do list are to increase fault-tolerance on the SAN and integrate two Solaris servers that are used for management and fund-tracking applications. These servers will add as much as 600G bytes of data to the SAN.
Plumbers & Pipefitters Pension Fund is also taking advantage of SANsymphony's snapshot back-up capability. "We are moving to a disk-to-disk backup, in which we use snapshots to back up to another disk in real time and then to tape from the second copy," Manning says. "The integration of features such as mirroring and [snapshot backup] are extremely important to us."