Revisiting the Aviation Safety vs. Networking discussion

Bill Woodcock woody at pch.net
Mon Dec 28 11:19:29 CST 2009


The connection may not be immediately apparent, but I think Philip 
Greenspun's article critiquing Malcolm Gladwell's musings on cranial 
metrics etc. has some bearing:

   http://philip.greenspun.com/flying/foreign-airline-safety

...or is at least an interesting read.  In observing network operations 
screw-ups, I've seen a lot that were either caused by, or prolonged by, a 
culture-of-emergency.  Young guys drinking way too much coffee, working a 
service window at two in the morning, believing they've seen something 
that needs to be fixed, and winging it.  In building networks, I've tried 
very hard to engineer things such that the operating procedure for dealing 
with an "emergency" is to note its existence and place it in a work queue 
to be dealt with by people who are on a day shift, have just come in 
from a full night's sleep, and are working in a team with senior people 
who can assist with anything tricky, and make sure that junior folks are 
following proceedures that have been worked out in advance by people who 
had plenty of time in a lab, and plenty of time to choose the best of many 
alternative procedures.

In my experience, reducing the frequency of emergencies is most beneficial 
in reducing the frequency of outages.  :-)

                                -Bill





More information about the NANOG mailing list