Never push the Big Red Button (New York City subway failure)

Baldur Norddahl baldur.norddahl at gmail.com
Fri Sep 10 20:16:56 UTC 2021


A nearby datacenter once lost power delayed because someone hit the switch
to transfer from city power to generator power and then failed to notice.
The power went out the day after when there was no fuel left.

On Fri, Sep 10, 2021 at 9:24 PM Matthew Huff <mhuff at ox.com> wrote:

> Since we are telling power horror stories…
>
>
>
>
>
> How about the call from the night operator that arrived at 10:00pm asking
> “Is there any reason there is no power in the data center?”
>
>
>
> Turns out someone had plugged in a new high end workgroup laser printer to
> the outside wall of the datacenter. The power receptacle was wired into the
> data center’s UPS and completely smoked the UPS. Luckily the static
> transfer switched worked, but the three mainframes weren’t’ happy…
>
>
>
>
>
> Or
>
>
>
> Our building had a major ground fault issue that took years to find and
> resolve. We got hit with lightning that caused the mainframe to fault and
> recycle…and two minutes in, we got hit by lightning again. When the system
> failed to start, we called IBM support. When we explained what happened
> there was a very long pause…then some mumbling off phone, then the manager
> got on the line and said someone would be flying out and be onsite within
> 12 hours. We were down for 3 days, and got fined $250,000 by the insurance
> regulators since we couldn’t pay claims.
>
>
>
> *Matthew Huff* | Director of Technical Operations | OTA Management LLC
>
>
>
> *Office: 914-460-4039*
>
> *mhuff at ox.com <mhuff at ox.com> | **www.ox.com <http://www.ox.com>*
>
>
> *...........................................................................................................................................*
>
>
>
> *From:* Chris Kane <ccie14430 at gmail.com>
> *Sent:* Friday, September 10, 2021 3:16 PM
> *To:* Christopher Morrow <morrowc.lists at gmail.com>
> *Cc:* Matthew Huff <mhuff at ox.com>; nanog at nanog.org
> *Subject:* Re: Never push the Big Red Button (New York City subway
> failure)
>
>
>
> True EPO story; maintenance crew carrying new drywall into the data center
> backed into the EPO that didn't have a cover on it. One of the most
> eerie sounds in networking...a completely silent data center.
>
>
>
> -chris
>
>
>
> On Fri, Sep 10, 2021 at 2:48 PM Christopher Morrow <
> morrowc.lists at gmail.com> wrote:
>
>
>
>
>
> On Fri, Sep 10, 2021 at 1:49 PM Matthew Huff <mhuff at ox.com> wrote:
>
> Reminds me of something that happened about 25 years ago when an
> elementary school visited our data center of the insurance company where I
> worked. One of our operators strategically positioned himself between the
> kids and the mainframe, leaned back and hit it's EPO button.
>
>
>
> Or when your building engineering team cuts themselves a new key for the
> 'main breaker' for the facility... and tests it at 2pm on a tuesday.
>
> Or when that same team cuts a second key (gotta have 2 keys!) and tests
> that key on the same 'main breaker' ... at 2pm on the following tuesday.
>
>
>
> <quadruple face palm>
>
>
>
> not fakenews, a real story from a large building full of gov't employees
> and computers and all manner of 'critical infrastructure' for the agency
> occupying said building.
>
>
>
> Matthew Huff | Director of Technical Operations | OTA Management LLC
>
> Office: 914-460-4039
> mhuff at ox.com | www.ox.com
>
> ...........................................................................................................................................
>
> -----Original Message-----
> From: NANOG <nanog-bounces+mhuff=ox.com at nanog.org> On Behalf Of Sean
> Donelan
> Sent: Friday, September 10, 2021 12:38 PM
> To: nanog at nanog.org
> Subject: Never push the Big Red Button (New York City subway failure)
>
> NEW YORK CITY TRANSIT RAIL CONTROL CENTER POWER
> OUTAGE ISSUE ON AUGUST 29, 2021
> Key Findings
> September 8, 2021
>
>
>
> https://www.governor.ny.gov/sites/default/files/2021-09/WSP_Key_Findings_Summary-for_release.pdf
>
> Key Findings
> [...]
>
> 3. Based on the electrical equipment log readings and the manufacturer’s
> official assessment, it was determined that the most likely cause of RCC
> shutdown was the “Emergency Power Off” button being manually activated.
>
> Secondary Findings
>
> 1. The “Emergency Power Off” button did not have a protective cover at the
> time of the shutdown or the following WSP investigation.
>
> [...]
> Mitigation Steps
>
> 1. Set up the electrical equipment Control and Communication systems
> properly to stay active so that personnel can monitor RCC electrical
> system operations.
>
> [...]
>
>
>
>
> --
>
> Chris Kane
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20210910/47feca04/attachment.html>


More information about the NANOG mailing list