Stories by Peter Baer Galvin

Reliability Without the Cluster

Horror stories of system administrators' mistakes, overlooked system aspects, and misconfiguration abound. Sometimes these factors combine to create more downtime than any one does individually; sometimes, in an attempt to correct a problem, administrators make additional mistakes that cause still more downtime. This month, I'll offer you advice on how to keep your Sun Microsystems Inc. server from failing. From the sublime to the ridiculous, all of these rules are gleaned directly from real-life experiences. In each case, had the rule been known and followed, there would have been less downtime -- or none at all.