[fwd] Rats take down Stanford ...

Paul Ferguson pferguso at cisco.com
Tue Oct 22 13:30:12 UTC 1996


A follow-up thought on redundancy issues.

- paul

[snip]


>Date: Mon, 21 Oct 1996 12:54:05 -0700 (PDT)
>From: risks at csl.sri.com
>Subject: RISKS DIGEST 18.54

[snip]

>
>Date: Fri, 18 Oct 96 11:03 EST
>From: William Hugh Murray <0003158580 at mcimail.com>
>Subject: Re: Rats take down Stanford ... (RISKS-18.53)
>
>PGN's request for redundancy brings to mind the story of the infrastructure
>computer center in Trumbull, Connecticut.  It is an old story but bears
>repeating.
>
>Seems that a squirrel got into a transformer and brought down the external
>power supply.  The UPS kicked in, engine generators came on line, and the
>center operated in this mode for about an hour and a half.  At the end of
>that time the external power was restored.  The external power, the UPS, and
>the engine generators went inot a deadly embrace.  The whole thing came down
>and would not come back up.
>
>I take two lessons from this.  First, redundancy adds some complexity and a
>lot of redundancy adds a lot of complexity.  At some point the redundancy
>begins to introduce failure modes and failure events that would not have
>exited in its absence.  There is an upper bound to such redundancy.
>
>Second, test redundant systems through to resumption of normal operations.
>In this case, the operators had tested to ensure that the redundant systems
>would come online in the event of a failure of the primary system.  They had
>not tested to see what would happen when the primary system was restored to
>normal operation.
>
>Who would have even thought about it?  I confess that I would not have.
>
>William Hugh Murray, New Canaan, Connecticut
>

[snip]






More information about the NANOG mailing list