I was riding in the car of a fellow CTO recently and talking about the flakiness of enterprise software and systems when his cell phone rang. He glanced at the caller ID. As he answered the call, the universal expression of concern washed over his face. One of his staffers was calling to utter the most ominous words in IT: The system is down.
Every company has its own version of "the system" -- often many permutations. The system is usually the most important tool being used at any particular time: e-mail, the network, a CMS (content management system), or an SFA system. Having been on the receiving end of such calls, I immediately identified with my CTO companion, and we both laughed, albeit nervously. The caller ID on my cell phone usually alerts me that the system is down before I even answer it. If I see the phone number of our San Francisco-based content management guru flash on the screen, I know our CMS is having problems. The telltale nighttime call from San Mateo, Calif., means we're having network issues. These situations are rare, but I've been working in IT long enough to never be surprised when the system goes down. In IT, Murphy's Law can bite you when you least expect it and entropy is expected.
The system runs smoothly most of the time, invisibly making the lives of employees easier. But when the system is down, our lives and our work grind to a halt. In recent months, the system has been down at my bank, my grocery store, a hotel checkout system, an airline, and yes, at InfoWorld. There's been at least one occasion where the separate system to alert us that the system is down bites the dust. When that has happened, we've conceived systems that will monitor the systems that monitor the system. There are systems inside those systems that monitor the system that also could be monitored, and often our discussions devolve into circular discussions that are the IT equivalent of Laurel and Hardy's "Who's on First?"
The possibilities for monitoring are endless, so the IT hypochondriacs keep shoving various medicines into the system, but the system still goes down. When a vendor touts a "five nines" solution to me, I'll admit that I think about the five minutes a year that system will be down, not the more than 525,000 minutes it will be up, because those five minutes are when the help desk phones will be ringing off the hook. Thousands of minutes of uptime can quickly drown in a few minutes of downtime.
During the holiday season, keeping the system up becomes a real challenge. Most IT employees head home, using up all the vacation and comp days they've earned via their hard work throughout the year. Other non-IT employees take a break as well, but IT problems inevitably occur with the Scrooges who must get e-mail on Thanksgiving or Christmas.
I remember standing in the backyard of my parent's house in North Carolina one Thanksgiving and listening on my cell phone to an executive complain -- from Hawaii -- that he couldn't get e-mail: "The system is down." Then my cell phone went dead. Apparently, that system went down while I was trying to figure out why and how our system was down.
When the system is down, sometimes it's hard to know which way is up.