[fwd] Rats take down Stanford ...
Paul Ferguson
pferguso at cisco.com
Tue Oct 22 13:30:12 UTC 1996
A follow-up thought on redundancy issues.
- paul
[snip]
>Date: Mon, 21 Oct 1996 12:54:05 -0700 (PDT)
>From: risks at csl.sri.com
>Subject: RISKS DIGEST 18.54
[snip]
>
>Date: Fri, 18 Oct 96 11:03 EST
>From: William Hugh Murray <0003158580 at mcimail.com>
>Subject: Re: Rats take down Stanford ... (RISKS-18.53)
>
>PGN's request for redundancy brings to mind the story of the infrastructure
>computer center in Trumbull, Connecticut. It is an old story but bears
>repeating.
>
>Seems that a squirrel got into a transformer and brought down the external
>power supply. The UPS kicked in, engine generators came on line, and the
>center operated in this mode for about an hour and a half. At the end of
>that time the external power was restored. The external power, the UPS, and
>the engine generators went inot a deadly embrace. The whole thing came down
>and would not come back up.
>
>I take two lessons from this. First, redundancy adds some complexity and a
>lot of redundancy adds a lot of complexity. At some point the redundancy
>begins to introduce failure modes and failure events that would not have
>exited in its absence. There is an upper bound to such redundancy.
>
>Second, test redundant systems through to resumption of normal operations.
>In this case, the operators had tested to ensure that the redundant systems
>would come online in the event of a failure of the primary system. They had
>not tested to see what would happen when the primary system was restored to
>normal operation.
>
>Who would have even thought about it? I confess that I would not have.
>
>William Hugh Murray, New Canaan, Connecticut
>
[snip]
More information about the NANOG
mailing list