Revisiting the Aviation Safety vs. Networking discussion
Bill Woodcock
woody at pch.net
Mon Dec 28 17:19:29 UTC 2009
The connection may not be immediately apparent, but I think Philip
Greenspun's article critiquing Malcolm Gladwell's musings on cranial
metrics etc. has some bearing:
http://philip.greenspun.com/flying/foreign-airline-safety
...or is at least an interesting read. In observing network operations
screw-ups, I've seen a lot that were either caused by, or prolonged by, a
culture-of-emergency. Young guys drinking way too much coffee, working a
service window at two in the morning, believing they've seen something
that needs to be fixed, and winging it. In building networks, I've tried
very hard to engineer things such that the operating procedure for dealing
with an "emergency" is to note its existence and place it in a work queue
to be dealt with by people who are on a day shift, have just come in
from a full night's sleep, and are working in a team with senior people
who can assist with anything tricky, and make sure that junior folks are
following proceedures that have been worked out in advance by people who
had plenty of time in a lab, and plenty of time to choose the best of many
alternative procedures.
In my experience, reducing the frequency of emergencies is most beneficial
in reducing the frequency of outages. :-)
-Bill
More information about the NANOG
mailing list