Like overstuffed closets, cluttered enterprise backup operations scream for attention. Fortunately, vendors are coming out with data de-duplication functions -- packed into storage software suites or in stand-alone appliances -- that sort through data destined for the archives and eliminate the redundancies.
Analysts say the technology can provide a 20-to-1 reduction of backup data. In other words, 20TB of original data can be shrunk to 1TB for backup purposes.
Eliminating duplicate data seems like a no-brainer, but in the past, corporations were leery of losing data on its way to backup repositories. Only now are they getting comfortable with the reliability of de-duplication technology, which has matured thanks to advancements in data transfer techniques and standards. Specifically, the rise of Advanced Technology Attachment and Serial ATA technologies, along with huge spikes in processing power, have fostered better de-duplication functionality.
Suddenly, de-duplication is catching on big time, attracting big-name vendors such as EMC and Symantec. In November, EMC acquired de-duplication vendor Avamar Technologies, and now EMC is incorporating de-duplication into its Clariion, Centera and NetWorker product lines. Meanwhile, Symantec is reportedly scrambling to inject de-duplication capability into its Veritas NetBackup storage management software.
The premise behind de-duplication is as fundamental as it sounds. "Imagine having a Word document that was several megabytes in size. If you e-mailed that to a colleague who then added one word to that document, some [systems] would determine that this was a new document that needed to be backed up again," says Jason Paige, information systems manager at Integral Capital Partners, an investment firm in California.
To make sure files such as Word documents with minor tweaks aren't stored several times over, ICP uses Avamar's de-duplication technology.
Corporate IT's comfort level with the technology has increased to the point where some IT executives wonder whether de-duplication could extend from backup operations to disaster recovery and even primary storage. But first there are lingering questions about where best to insert de-duplication functionality in the backup process: at the client, at the disk or at the virtual tape library (VTL).
IT managers will have to ask vendors hard questions, because de-duplication methods vary significantly by vendor. "There is still a lot of confusion in the market about what data de-duplication is and isn't -- and where it is best done. This confusion can delay adoption," says Heidi Biggar, an analyst at Enterprise Strategy Group in Massachusetts, U.S.
But whatever confusion exists, corporate IT shops shouldn't be stumped for too long. "There are pros and cons to each approach, but all have potentially significant benefits for users by allowing them to reduce the amount of [storage] capacity they need on the back end," Biggar says. The benefits extend to other areas, too. For example, de-duplication can reduce the network bandwidth required for long-distance data replication, she says.