[outages] News item: Blackberry services down worldwide, Egypt affected (not N.A.)

Valdis.Kletnieks at vt.edu Valdis.Kletnieks at vt.edu
Wed Oct 12 15:49:56 UTC 2011


On Wed, 12 Oct 2011 09:52:02 CDT, -Hammer- said:
> What kills me is what they have told the public. The lost a "core 
> switch". I don't know if they actually mean network switch or not but 
> I'm pretty sure any of us that work on an enterprise environment know 
> how to factor N+1 just for these types of days. And then the backup 
> solution failed? I'm not buying it either.

Yeah, and that extra comma in the one config file that didn't make a difference
when you tested the failover in the lab *never* makes a difference when it hits
in the production network, right?  Or they changed the config of the primary and
it didn't get propogated just right to the backup, or they had mismatched firmware
levels on blades in the blades on the primary and backup switches, so traffic that
didn't tickle a bug on the primary blades caused the blade to crash on the backup,
or...

Anybody on this list who's been around long enough probably has enough "We
should have had N+2 because the N+1'th device failed too" stories to drain
*several* pitchers of beer at a good pub... I've even had one case where my
butt got *saved* from a ohnosecond-class whoops because the N+1'th device *was*
crashed (stomped a config file, it replicated, was able to salvage a copy from
a device that didn't replicate because it was down at the time).

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 227 bytes
Desc: not available
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20111012/9b023da8/attachment.sig>


More information about the NANOG mailing list