Virtually every industry that relies on central datacentre functions has experienced EPO disruptions. While some of the EPO disruptions were caused by faulty wiring, under-floor cable pulls snagging the EPO conduit, water leaks and poor maintenance, the majority of datacentres shut down by EPO activation were caused by a human pushing an EPO button in error. In many cases, the activation was the result of an occupant pushing buttons near the exit thinking they were deactivating magnetic security locks.
In at least one recent case, the EPO disruption was done on purpose: a systems administrator shut down a datacentre that controls the Californian electrical grid.
Hundreds of incidents across the US are reported annually in datacentres. These are the same facilities where millions of dollars were originally invested to achieve electrical fault-tolerance and continuous availability. Every IT, network and telecommunications component powered in a raised-floor area is at risk.
Still, in the US, the EPO button is required by Articles 645.10 and 645.11 of the National Electrical Code. These rules mandate that computer rooms have an EPO system at each exit to disable power under the raised floor as well as to disable power to air conditioning that supplies cooling to the raised floor. By code, the disconnection mechanism may be a single button or two adjacent buttons - one for power, the other for cooling.
But all too often, these EPO buttons are placed next to the many other exit-mounted devices, including fire-suppression release/abort buttons, light switches, security card readers, fire extinguishers, fire alarm panels, telephones, security intercoms and exit buttons.
This confusing conglomeration next to the exit door can easily allow datacentre occupants to select the EPO when they were simply trying to turn on the lights or call security.
Even momentary pushes on the EPO button will shut down the datacentre and require maintenance staffers to reset all tripped electrical devices. Electrical reset could take up to 30 minutes - this in an environment where a fraction of a second can cause irreparable damage to hardware, databases and corporate profits.
It is probable that this single point of failure is one of the leading causes of critical power loss in the US. These electrical disruptions occur with the same regularity as utility disruptions, engine-generator failures and nuisance circuit-breaker trips, but they are generally not seen as failures. Because the button is pushed on purpose, whether by mistake or not, these are considered accidents but not the same as utility disruptions.
But there is a way of making the EPO button less hazardous to your datacentre's health. There is a protocol that has been tried for more than a decade in dozens of datacentres around the country. It could be implemented in your datacentre within a few hours and a few hundred dollars per exit - truly a small price to pay to eliminate a common source of risk in a modern datacentre.
In the photo above, note that the EPO is clearly marked as the "Emergency Power Off Button." The intent is to distinguish it from the other devices at the doorway of the datacentre. Note that the cover over the EPO has a keyed lock, but the key is already inserted. Opening the case will need to be very intentional - but if a real emergency existed, the lock would not be an impediment.