Multiple short outages can add up to major problems

Preparing for major catastrophes is just one piece of IT disaster planning these days

Corporate executives have long created IT plans to cope with major disasters, but now they're increasingly taking steps to prevent the brief shutdowns that can cost companies hundreds of thousands of dollars or more in their own right.

Users and analysts at IDC's Enterprise Data Center Forum here last week listed several options for quickly recovering from or preventing relatively minor incidents -- like user miscues or electricity brownouts -- that can shut down systems for an hour to a half-day or so.

Doug Roberts, manager of system services at Hannaford Bros, became aware of the threat posed by seemingly minor incidents about 10 years ago, when his company had a single data center with a diesel generator for backup.

At the time, the US-based supermarket chain was focused on preparing for major disasters. "We'd do the big four-and-a-half-day disaster recovery event, planning for a hurricane or whatever," Roberts said. "We'd go to the IBM facility, practice the drill."

Then an incident completely out of Hannaford's control temporarily shut down the data center and the backup generator. At a truck yard across the street, Roberts said, an 18-wheeler "did a U-turn and [accidently] dumped the contents of its fuel tank." The city shut down all power to the area and wouldn't allow Hannaford to use its generator because of the risk of fire.

After that incident, Hannaford installed near-real-time backup systems for its mainframes and key Unix and Windows servers at another data center about seven miles away, as well as at a smaller facility in upstate New York. "It's kind of a poor man's cluster," Roberts said.

In an August 2007 IDC survey of 350 data center professionals, about 37 per cent of the respondents said that their data centers had experienced an outage of some sort. The survey did not ask about the length of outages or when they occurred.

Matthew Eastwood, an IDC analyst, said human error is the most common cause of data center outages. Causes range from mistakenly hitting the emergency power-off button to tripping over a power cord.

The second most common causes of outages are incidents outside of the data center's control, such as what happened at Hannaford.

Eastwood said that data centers can also face problems when cooling and power equipment, which are often overseen by the facilities group, are not in sync with IT requirements.

"Both groups should report into the same organization," or at least they should better coordinate their plans, Eastwood said.

Toyota Financial Services found another route to cutting down on short-term data center outages.

Not too long ago, the company had what it considered major incidents -- outages of at least an hour -- three or four times a week, according to Dave Howard, national manager of service management at Toyota's financing arm. The problems included downed networks, enterprisewide application problems, and server or facility outages, he said.

Join the Computerworld Australia group on Linkedin. The group is open to IT Directors, IT Managers, Infrastructure Managers, Network Managers, Security Managers, Communications Managers.

More about: IBM, IDC, Toyota
Comments are now closed.
Related Whitepapers
Latest Stories
Community Comments
Whitepapers
All whitepapers

Should Australians prepare for rubber-hose cryptanalysis?

READ THIS ARTICLE
DO NOT SHOW THIS BOX AGAIN [ x ]
Sign up now to get free exclusive access to reports, research and invitation only events.

Computerworld newsletter

Join the most dedicated community for IT managers, leaders and professionals in Australia