United Airlines is Down (!) due to network connectivity problems

Matthew Huff mhuff at ox.com
Thu Jul 9 00:14:43 UTC 2015


I've been working at a trading firm for the last 18 years. Most of the Market traditionally rolls out changes out over the weekends, making every Monday an adventure. It's unusual that they would roll out anything during the week, but they could have had something that failed and had to be undone last weekend, they rolled it out last night because they thought they had it fixed. They may have had a reason why they needed it out in a hurry.

The summer is a big time for changes because it's so less busy.. We usually roll out changes on Thursday nights since Friday's are the least busy. Summer Friday's are completely dead.

This puts NYSE in a double bad light. First the glitch and second the market traded close to normal without the NYSE.

On Jul 8, 2015, at 5:49 PM, Dovid Bender <dovid at telecurve.com<mailto:dovid at telecurve.com>> wrote:

Well that's a given. I am talking about organizations like the NYSE or MaBell,

On Wed, Jul 8, 2015 at 5:44 PM, Keith Stokes <keiths at neilltech.com<mailto:keiths at neilltech.com>> wrote:
Who roles out software in the middle of the week and not on weekends? People who have more business on the weekends than the week, such as retail.

On Jul 8, 2015, at 4:40 PM, Dovid Bender <dovid at telecurve.com<mailto:dovid at telecurve.com>> wrote:

Other than for an emergency repair who roles out a software update in
middle of the week? We test, test and then test some more and only then
roll out on weekends. Our maintenance window is 00:00 - 01:00 Sunday
mornings for sw updates etc.


On Wed, Jul 8, 2015 at 3:02 PM, Matthew Huff <mhuff at ox.com<mailto:mhuff at ox.com>> wrote:

Traders on the floor are being told that it's a software glitch from new
software that was rolled out Tuesday night. Nothing official has been
said.  The only thing I know for sure is that if the NYSE was hacked, they
wouldn't tell anyone the details for a long time, if ever.

The impact of the NYSE being down is much less significant than it used to
be since most stocks are multiple-listed on other exchanges.

The lack of information through official channels is unusual though. In
previous situations, there has been at least a little hand-holding. So far,
nada. In fact, other than financial service provider's emails, there has
been no emails so far today from the NYSE, including the announcement of
resumption of service. According the the NYSE web page, trading will resume
at 3:05pm EST today with primary specialist, and 3:10 for everyone.




On Jul 8, 2015, at 2:33 PM, Brett Frankenberger <rbf+nanog at panix.com<mailto:rbf+nanog at panix.com>>
wrote:

On Wed, Jul 08, 2015 at 01:55:43PM -0400, Valdis.Kletnieks at vt.edu<mailto:Valdis.Kletnieks at vt.edu> wrote:
On Wed, 08 Jul 2015 17:42:52 -0000, Matthew Huff said:

Given that the technical resources at the NYSE are significant and
the lengthy duration of the outage, I believe this is more serious
than is being reported.

My personal, totally zero-info suspicion:

Some chuckleheaded NOC banana-eater made a typo, and discovered an
entirely new class of wondrous BGP-wedgie style "We know how we got
here, but how do we get back?" network misbehaviors....

We don't know how long the underlying problem lasted, and how much of
the continued outage time is dealing with the logistics of restarting
trading mid-day.  Completely stopping and then restarting trading
mid-day is likely not a quick process even if the underlying technical
issue is immediately resolved.

(Such things have happened before - like the med school a few years ago
that
extended their ethernet spanning tree one hop too far, and discovered
that
merely removing the one hop too far wasn't sufficient to let it come
back up...)

No, but picking a bridge in the center, giving it priority sufficient
for it to become root, and then configuring timers[1] that would
support a much larger than default diameter, possibly followed by some
reboots, probably would have.

>From what has been publicly stated, they likely took a much longer and
more complicated path to service restoration than was strictly
necessary.  (I have no non-public information on that event.  There may
be good reasons, technical or otherwise, why that wasn't the chosen
solution.)

   -- Brett

[1] You only have to configure them on the root; non-root bridges use
what root sends out, not what they ahve configured.




---

Keith Stokes








More information about the NANOG mailing list