United Airlines is Down (!) due to network connectivity problems

Wed Jul 8 18:33:21 UTC 2015

On Wed, Jul 08, 2015 at 01:55:43PM -0400, Valdis.Kletnieks at vt.edu wrote:
> On Wed, 08 Jul 2015 17:42:52 -0000, Matthew Huff said:

> > Given that the technical resources at the NYSE are significant and
> > the lengthy duration of the outage, I believe this is more serious
> > than is being reported.
> 
> My personal, totally zero-info suspicion:
> 
> Some chuckleheaded NOC banana-eater made a typo, and discovered an
> entirely new class of wondrous BGP-wedgie style "We know how we got
> here, but how do we get back?" network misbehaviors....

We don't know how long the underlying problem lasted, and how much of
the continued outage time is dealing with the logistics of restarting
trading mid-day.  Completely stopping and then restarting trading
mid-day is likely not a quick process even if the underlying technical
issue is immediately resolved.

> (Such things have happened before - like the med school a few years ago that
> extended their ethernet spanning tree one hop too far, and discovered that
> merely removing the one hop too far wasn't sufficient to let it come back up...)

No, but picking a bridge in the center, giving it priority sufficient
for it to become root, and then configuring timers[1] that would
support a much larger than default diameter, possibly followed by some
reboots, probably would have.  

>From what has been publicly stated, they likely took a much longer and
more complicated path to service restoration than was strictly
necessary.  (I have no non-public information on that event.  There may
be good reasons, technical or otherwise, why that wasn't the chosen
solution.)

     -- Brett

[1] You only have to configure them on the root; non-root bridges use
what root sends out, not what they ahve configured.