There's nothing quite like an over-used buzzword to make a people feel like they are in a technology twilight zone. A few years ago, you could not have a discussion about storage networking without somebody describing the amorphous storage infrastructures that they envisioned. All you had to do was walk into that mess of non-interoperable point products, click your heels three times and voila! - an infrastructure appeareth.
So, imagine my surprise when I discovered myself thinking about "storage and data management infrastructures" recently. I was trying to write and think at the same time when I latched onto the brilliant idea that this Internet thing was actually going to amount to something. I honestly don't know where it started - it must have been an NPR or PBS broadcast - but I realized that the Internet was not revolutionary but instead was an evolutionary extension of the existing data sharing machinery that more or less began with Gutenberg's printing press in 1440.
A fairly influential device in its time, the printing press enabled rabble-rouser Martin Luther to publish his Top 95 list of problems with the Catholic Church and gave William Shakespeare a way for people to enjoy his plays without having to deal with the crowds at the theater. Reminds one of the Internet.
Storage's role within the infrastructure
Within that historical context, I began to think about the role storage has within the infrastructure. I've been reluctant to raise storage to the level of infrastructure, but there really is this nagging problem of maintaining 24x7 access to all the data that is constantly piling up. The Internet and all other forms of computing are pretty much useless without data, so it behooves us to do our part and ensure that there is data to access and paths to enable that access.
There are big changes afoot concerning the legal climate of data storage. These changes have led to a trendy discussion about data lifecycle management, one of this year's most popular new buzzwords. Governments are now telling us through acts, laws, writs and other declarations that we have a legal responsibility to perform our duties even though they can't really explain what level of performance is required. I suppose recognizing adequate data protection will be something like judging pornography: We'll know it when we see it.
To me, this is not as simple as buying a tool for the job, such as a data lifecycle management product. There is nothing wrong with data lifecycle management - in fact, I think the concept is terrific. I simply doubt that a single packaged product is going to provide the flexibility to deliver the long-range results that organizations need. Data lifecycle management needs to be looked at as a process and not as a product.
The reality of data center management today is that our best practices are going to be under much closer scrutiny than we like. The process of maintaining, protecting and preserving data while providing data access may not necessarily be more serious than it was in the past, but it will definitely be much more visible and scrutinized. And violators, no matter how well-meaning, will be dealt with harshly.
Taking care of data, every day
We will see the role of IT change from being primarily a system or application support function to being a data management function. The term I like to use to describe this shift is "data stewardship." The term "stewardship" might sound old-fashioned or over the top, but it summarizes the situation well. If we are good stewards of data, we will be doing our part to further our organizations' missions.
Considering the dynamics of the situation, data stewardship requires fairly sophisticated approaches and technologies to get the job done, which brings me back to my ruminations about infrastructure. Three years ago, I didn't take this stuff so seriously, but 9/11 and the financial scandals that have shaken our confidence and trust have changed my perceptions of storage responsibilities.
So if a storage and data management infrastructure is what's needed, what should it look like? Like all infrastructures, it needs to be flexible, stable and cost-efficient. Maintaining both flexibility and stability is a balancing act to be sure, but flexibility is needed for establishing broad leverage at affordable costs, and stability is a prerequisite for data longevity.
Three data stewardship tools
Storage networking infrastructures have to cover the three fundamental functions in a storage network: connecting, storing and filing. While the industry has spent most of its time trying to figure out connecting and storing, there are going to be even larger changes in the area of filing. Filing provides the context that allows us to identify, name and categorize the enormous amounts of data with which we will be working. Storage and data management infrastructures are going to have file-level functions that are global in nature, as opposed to being closely tied to individual systems.
Along with the three fundamental functions in storage networking, there are also three primary concepts that we can use to forge tools for data stewardship. The first is virtualization, the second is redundancy, and the third is management integration.
People who know me know that "virtualization," like "infrastructure," is another word to which I have grown psycho-allergic. If there ever was an overused buzzword or concept, virtualization is it. However, I am also convinced that the ability to manage resources by aggregation, subdivision and substitution is extremely powerful and is probably the best way to deal with current scaling issues.
Before SANs, Ethernet and IP data networking were never really used for data stewardship functions. We assumed networks functioned best with wide-open access characteristics. However, the impulse to protect data is much stronger in storage networks than in data networks.
Network virtualization, with technologies such as VSANs or VLANs, is a much more effective way of restricting unwanted access to data than techniques such as zoning or LUN masking. In addition, network virtualization in the form of trunking should be a core capability in a storage network infrastructure. Why build an infrastructure if it does not perform as needed?
Virtualization must scale
Storage virtualization has been discussed at length in the industry. I won't add much to all those previous discussions in this column, except to say that a virtualization element that is part of an infrastructure needs to scale to very large sizes as well as be integrated with a global management system. Virtualization point products may solve specific problems but may not be re-usable for the next problem. Storage virtualization products will need to support management standards such as SMI-S if they hope to play in the enterprise data and storage infrastructure.
Virtualization at the file level is slightly more complex. Using the model of block-level virtualization, it is possible to establish a hierarchy of access points, like a pyramid of file systems. It is also possible to apply virtualization to the storage devices and subsystems that a file system uses by creating a lower-level abstraction layer that incorporates a wide variety of storage resources for different types and priorities of data. At some point, the concept of QoS will come to storage, and file-level virtualization will be one of the most effective ways to realize it.
Redundancy is needed across all functions, including networks and storage subsystems. It is needed as a result of data management functions provided by filing software. Data availability is closely tied to the availability of the network storage elements, which depends on redundancy techniques.
Obviously, disk mirroring, RAID and hot-sparing technologies have fundamental low-level roles in the infrastructure. Also important are higher level data applications such as point-in-time data copying. Both file-based and block-based point-in-time snapshots are needed to support different recovery and operating requirements.
Remote copy technologies should also be included for business continuity purposes. Regardless of what people think about backup, the need for it is not going away. It is, however, likely to change radically as the infrastructure brings more options for storing backup data.
SMI-S is key
The global management of all elements is essential for an effective cost-saving infrastructure. The SMI-S (Storage Management Interface Specifications) standard for storage network management is an enormous step in the right direction. Management systems should all use SMI-S as the primary way to communicate with all entities in the network. Legacy equipment that does not support SMI-S will eventually need to be replaced if the infrastructure is expected to meet its ROI objectives.
Let's see, I think that's 20 references to the word "infrastructure" (21 now) in this column, which is more than I ever wanted. But it's a serious word, and these are serious times for IT professionals. If we are going to be placed under the magnifying glass for the way we manage data, then we ought to be able to get the technology that we need, along with the time to implement designs that will meet our needs.
This time, it's not all for the company, it's for our own well-being too. The problem is, we are the only ones who know how difficult this is, and it may be difficult getting corporate management to spend money based on what we are telling them.
- Marc Farley is president of Building Storage Inc.