[outages] News item: Blackberry services down worldwide, Egypt affected (not N.A.)

Tayeb Meftah tayeb.meftah at gmail.com
Wed Oct 12 15:56:40 UTC 2011


Envoyé de mon iPhone

Le 12 oct. 2011 à 17:55, Charles Mills <w3yni1 at gmail.com> a écrit :

> +1
> On Oct 12, 2011 11:51 AM, <Valdis.Kletnieks at vt.edu> wrote:
>> On Wed, 12 Oct 2011 09:52:02 CDT, -Hammer- said:
>>> What kills me is what they have told the public. The lost a "core
>>> switch". I don't know if they actually mean network switch or not but
>>> I'm pretty sure any of us that work on an enterprise environment know
>>> how to factor N+1 just for these types of days. And then the backup
>>> solution failed? I'm not buying it either.
>> Yeah, and that extra comma in the one config file that didn't make a
>> difference
>> when you tested the failover in the lab *never* makes a difference when it
>> hits
>> in the production network, right?  Or they changed the config of the
>> primary and
>> it didn't get propogated just right to the backup, or they had mismatched
>> firmware
>> levels on blades in the blades on the primary and backup switches, so
>> traffic that
>> didn't tickle a bug on the primary blades caused the blade to crash on the
>> backup,
>> or...
>> Anybody on this list who's been around long enough probably has enough "We
>> should have had N+2 because the N+1'th device failed too" stories to drain
>> *several* pitchers of beer at a good pub... I've even had one case where my
>> butt got *saved* from a ohnosecond-class whoops because the N+1'th device
>> *was*
>> crashed (stomped a config file, it replicated, was able to salvage a copy
>> from
>> a device that didn't replicate because it was down at the time).

More information about the NANOG mailing list