What happens when data files get very large

Last time we talked about new, very large data files that are the result of improvements in some important technologies, and I raised a concern that we will have to contend with some new issues when it comes time to manage such large objects in the data center. More on this today.

The case of high definition television (HDTV) is useful here as it may provide us with a worst-case scenario.

HDTV is already starting to arrive, and with it comes a vast storage requirement. For example, graphics departments at firms that provide content for HDTV create files in a high-resolution encoded format that displays data at 30M bit/sec. Such files will require 13.5G bytes of storage space for each hour of video produced.

How well suited is the current enterprise IT shop to cope with the management and manipulation of such data? In most cases, the answer is probably "not very." Here's why.

First, such data will require new operating systems not typically found in enterprise IT rooms. Servers with 32-bit operating systems would likely be dragged to their knees by such files - the sophisticated math operations required by the applications that work with such files typically require 64-bit operating systems. If support for such machines (many of which are now moving from 32- to 64-bit operating systems) falls within your responsibilities, the fun has only just begun. Besides, we all know learning a new operating system is easy, so give that job to some junior team members. Then, while your colleagues are learning about the new operating systems, you might give some thought to another question: What is going to be required when it comes time to move such large files from one node to another?

Moving massively large amounts of data across your present LAN and SAN infrastructure will be no trivial task, and move such data you assuredly will. And you may have to move it often. This is because all this new data makes it even more imperative that we manage with an eye to efficiency. Efficient use of storage assets will mean migrating these large files whenever there is no longer a requirement for rapid access to the data by multiple users. At such a point, data must be moved to second-tier systems, freeing up valuable space on the first-tier storage. Expect a fairly continuous movement of data from the "work" machines to lower tiers of storage.

Beyond this, archiving data off the second-tier storage will also be a fairly standard act. But archiving of such data may take quite a different form than what we are used to. This is because the information lifecycle of high-definition content should be expected to be quite different from the lifecycle of most corporate data we have managed in the past.

How is it different? When corporate data ages, it gets passed down the hierarchy to tape, where it resides in the archive until those relatively rare occasions when it is read again. High-definition content, on the other hand - online movies are a good example - must typically be available on-demand (think of video-on-demand at a hotel, or the files at a place like CNN). This change in the latter part of the information lifecycle means that archived high-definition content is more likely to be sent to near-line disk than to tape, and that managers charged with providing for this sort of on-demand requirement for archived material may have to consider some serious changes as they plan their hardware buys. Archiving to disk, essentially unheard of now, may come to prevail in some part of the computer room as low-priced disk technology becomes increasingly available.

Large files - which must be stored, moved, stored again, and frequently accessed perhaps for years afterwards - may significantly change the way we have to manage a significant subset of our data.

Join the newsletter!

Error: Please check your email address.

More about ACTCNNSEC

Show Comments