Simulated disaster exercise? Re: PAIX

Stephen J. Wilcox steve at telecomplete.co.uk
Sun Nov 17 01:57:52 UTC 2002



On Sat, 16 Nov 2002, Sean Donelan wrote:

> 
> In the 1990's the MAEs and Gigaswitches would give us an unscheduled
> failure of a major exchange point on a regular basis, which let us
> demostrate our disaster recovery capabilities.  With the improved
> reliability, i.e. the PAIXes haven't had a catastrophic failure, we
> haven't had as many opportunities to demonstrate how well we can handle
> a disaster at those locations.
> 
> Without creating an actual disaster, what if all the providers turned off
> their BGP sessions with other providers at a PAIX (or Equinix or LINX or
> where ever), both through the shared switch and private point-to-point
> links, for an hour.  More than likely no one would notice, but then
> we would have some hard data.  Individually providers have tested parts of
> their own network, but I haven't heard of any coordinated efforts to test
> recovery across all the service providers in a particular location.
> 

The main problem will be coordination.. you need to get all providers to do this
in a tight slot of only one hour. And to make this a good test you need to
ensure that all the major players take part more so than the smaller ISPs. From
what I've seen its difficult enough to get ISPs to make config changes within a
window of a couple of weeks so you're gonna have a problem pulling this
together!

Also from what I've seen I'll think you'll find things have changed, reduced
budgets have forced compromises on redundancy and shutting down an exchange will
have a noticable impact to users in the region... you could argue this is all
the more reason to conduct these exercises!

Steve





More information about the NANOG mailing list