Exponential data growth is emerging as arguably the biggest issue IT managers face. But, handily, data deduplication is one available tool to help tackle the problem. Tim Lohman reports.
You wouldn’t call TechnologyOne your average Australian SME but the software developer is in many ways very typical when it comes to the issue of managing rampant data growth.
The company, with around 700 staff and an information growth rate of 30 per cent — in a quiet year — was until recently trapped in a cycle of throwing ever more disc at the problem in the hopes of keeping near exponential data multiplication under control.
The company’s back up window, the period of time when back ups are permitted to run on a system, was also starting to grow by about half a day per year, meaning the organisation was in danger of breaching its own back up timeframe requirements.
However, as IT manager, Andrew Bauer, explains, the company has joined a growing number of organisations turning to data deduplication technology (in this instance, NetApp ’powered’ by Commvault) as its preferred method of coming to grips with one of the great IT challenges of the moment.
As Bauer tells it, while the company had been on a consolidation path for some time, adopting virtualisation and moving to blade servers as a way to reduce cooling and power requirements, it had struggled with managing its storage.
“Disc hasn’t gotten smaller physically and its power and cooling haven’t gotten smaller, so there hasn’t been much we could do about consolidating storage,” he says.
“So, when dedupe became available we saw that as a real way to reduce our requirements by avoiding more storage purchases, going forward, and making better use of the storage we have now.”
Data deduplication 101
Gartner defines deduplication — also referred to as ‘dedupe’ — as technology which uses identification and comparison algorithms in selecting data so that only unique “chunks” are stored, thus eliminating redundancy and reducing space requirements. Reduction rates, depending on the type of data and method used can be anywhere between three and 25 times, sometimes even more.
Dedupe typically comes in two forms: Pre-processing or post-processing. In pre-processing, also known as in-line, the dedupe technology sits between the server and the storage media deduplicating data as it travels from the server to the storage. In post-processing, data is allowed to go straight from the server to the storage during the day and is run against the stored data at a later time — typically overnight.
Dedupe solutions can also operate at the file, block or bit level. With file level deduplication, only a single instance of a 5MB PowerPoint presentation emailed to 10 people in the same organisation would be saved. In this way, what would have been a 50MB storage requirement is cut down.
Block and bit level deduplication are useful for their ability to ignore file types and examine changes which occur within data at the file or block level. If the same 5MB PowerPoint file mentioned above has 10 different drafts, then 10 different files roughly equalling 50MB won’t be saved. Instead, only the original 5MB file plus the bit or block-level data relating to specific changes made to that original file are kept, cutting down on the amount of data stored.
Dedupe benefits On top of postponing the point at which an organisation must invest in more storage, dedupe can increase data availability, cut costs associated with power and cooling, along with helping to increase network bandwidth through lower data throughput.
In the case of Curtin University of Technology, it also dramatically improved the organisation’s disaster recovery ability. As CIO Peter Nikoletatos explains, initiatives around digitising lectures and server virtualisation have contributed to exponential growth in storage demand, resulting in issues around incomplete cycles and decreased reliability in its tape-based back ups.
“It was apparent that we were backing up data that was clearly duplicated,” he says. “The introduction of [dedupe] has resulted in considerable deduplication as well as removing the need to use tapes and increasing the ability to recover data from days to minutes.”
According to Nikoletatos, moving to a dedupe solution — in Curtin’s case EMC’s Avamar product — has seen a doubling in the number of virtual machines being backed up and an average deduplication rate of close to 99 per cent.
The University’s back up window has been reduced from 24 hours to under three. As an example of the technology’s power, Nikoletatos says a Microsoft Exchange email server was recently recovered from disc in just 30 minutes. Recovering from tape would have taken between one to two days.