Operate until failure

Shawn McMahon smcmahon at eiv.com
Mon Jan 8 15:11:39 UTC 2001


On Mon, Jan 08, 2001 at 08:49:17AM -0600, Eric Whitehill wrote:
> 
> We've had issues here with power outages and usually the UPS' will hold.
> The one time they didn't, we went and brought all the machines down
> gracefully as we didn't have the auto-shutdown installed on the systems.  

We don't shut anything down with a management call, unless it's going to
fail and break something in the next 15 minutes.

We have a generator, but we have had two amazing coincidences cause it
to fail.  The first time, the generator was fine, but the switch didn't
switch.  The person who was signing off (erroneously) that he was checking
that switch monthly lost his job shortly before we stopped using his
company entirely.  We discovered the problem when the batteries reached
the point where it was supposed to cut over, and the entire data center
went dark.  That was a very, very bad day.

The second time, an o-ring blew out, and we dumped so much oil on the
ground, we were told that if it'd been a tiny bit more we'd have had to
call the EPA.  This one gave us enough warning to shut things down, but
we had to hustle and a few things were triaged as "let it die, we don't have
time."

In general, however, we start planning for a controlled shutdown the minute
we know there's a problem, and we attempt to schedule that shutdown for
our scheduled weekly outage window if possible.  If not, we try to make it
after peak processing time for the affected components.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20010108/01f45710/attachment.sig>


More information about the NANOG mailing list