Followup British Telecom outage reason

Kevin Gannon kgannon at lancomms.ie
Sat Nov 24 19:15:26 UTC 2001


Anyone have the Bug ID's ?

Regards,
Kevin

-----Original Message-----
From: owner-nanog at merit.edu [mailto:owner-nanog at merit.edu]On Behalf Of
Sean Donelan
Sent: 24 November 2001 19:17
To: Neil J. McRae
Cc: nanog at merit.edu
Subject: Re: Followup British Telecom outage reason




On Sat, 24 Nov 2001, Neil J. McRae wrote:
> I'd be surprised if it was the GSR, and in anycase that doesn't
> absolve anyone. If it was a software issue- why wasn't the software
> properly tested? Why was such a critical upgrade rolled out across
> the entire network at the same time? It doesn't add up.

It appears to be yet another CEF bug.  If you want to use a GSR
you are stuck using some version of IOS with a CEF bug.  The
question is which bug do you want.  Each version of IOS has
a slightly different set.  Several US network providers have also
been bitten by CEF bugs too.

While trying to fix one set of bugs, BT upgraded of their network.
I'm not sure if they were upgrading at 9am in the morning, or had
upgraded earlier and the bug finally came out under load at 9am.
When the BT network melted down, Cisco suggested installing a
different version of IOS, which had previously been tested.  At
noon, BT found the new version had an even worse bug, sending packets
out the wrong interface.  It was until 2200 (13 hours later), BT and
Cisco found a version of IOS which stablized the network.  "Stablized"
not fixed.  The running version of IOS still has a bug, but it isn't
as severe.




More information about the NANOG mailing list