Out-of-band paging

Steve Gibbard scg at gibbard.org
Wed Jul 28 16:54:29 UTC 2010


On Wed, 28 Jul 2010, Joel M Snyder wrote:

> But... you can take this sort of 'single point of failure' argument almost as 
> far as you want.  In the security business (where I spend most of my time), I 
> see people do this a lot--they get deep into the ultra-ultra-ultra marginal 
> risk, which takes then an enormous amount of money to mitigate.  It's an easy 
> rat hole to explore, and often fun.

I think people are getting lost in the weeds here, and confusing 
technologies with paths.

My current employer has been upgrading its transit circuits, and spent 
time in the last few months worrying about diversity of the transit paths. 
But we didn't insist that one provider come in via metro ethernet, one via 
SONET, and one via a GRE tunnel.  What we did was have them bring in 
network maps, and make them sell us circuits that weren't running down the 
same streets as our other providers.

The same goes for your paging network.  If it's running over IP, that's 
not a huge problem.  If anything, if you're an IP engineer, it probably 
makes it easier for you to audit the setup.  Where you do have a problem 
is if it's running over YOUR IP network, but that's just a more accute 
version of the problem you'd have if your paging company were using fiber 
along the same path as somebody you were buying fiber from.

So, for paging, or out of band management, or redundant capacity, the 
rules seem pretty simple.  Buy from somebody who's not your customer. 
Audit whatever information you can get about their network paths to verify 
that they're not sharing segments with you.  And, for good measure, have 
some backup plans in case the notifications don't work.

You probably are better off if you have humans in a NOC, rather than a 
purely automated alerting system.  Those people can notice if you're not 
responding, and be creative.  Maybe they can figure out how to fix 
problems themselves.  If all else fails, they may be able to dispatch 
somebody to your house.  Remember, organizations have been tracking down 
critical personnel for far longer than there have been telephones.

Or are people here worried about a scenario in which the entire world is 
run off of one big interconnected IP network, and that when it fails it's 
not only not possible to make a phone call, but also not possible to get 
across town to alert the people who could fix it?  It seems to me that if 
things really got that bad, it might be pretty hard for even the most 
oblivious on-call person to miss.

-Steve




More information about the NANOG mailing list