Revisiting the Aviation Safety vs. Networking discussion

George Bonser gbonser at
Fri Dec 25 13:53:14 CST 2009

> What I'm getting at is that after following this thread for a while,
> I'm not convinced any amount of process-borrowing is going to solve
> problems better, faster, or even avoid them in the first place. At
> best, our craft is 1/3rd as "old" (if that's somehow I measure of
> maturity) as flight and nobody is being sued to settle 200+ accidental
> deaths because of our mistakes.
> -Tk

Not now, that is true, but when you look at things that are "on the
drawing board" such as systems designed to manage automobile traffic
flows, networks that are used to fly UAVs, networks that keep track of
"friendly" units in combat where the technology might someday migrate to
civilian law enforcement and/or emergency services (keeping track of
where firefighters are in a building or at a wildfire, for example), I
can see situations in the future where people's lives could be dependent
on networks working properly, or at least endangered if a network fails.

But my original intent was to point out that there are two kinds of
process for two different kinds of circumstances and the sort of process
surrounding routine changes might not be the best process for handing
emergency changes. I have seen examples of places that want to handle
emergency changes with the same sort of process they use for routine
changes and those places can be frustrating to work with when stuff is
broken. My goal was to give managers of networks who might read this the
idea that when the fan is in an unsavory condition, more can get done by
shifting from a mode of questioning, analyzing and second-guessing
everything the engineer is doing to a mode where the organization is
responding to immediate needs, clearing obstacles out of the way, and
documenting as best they can what is done and when, to make the
debriefing afterwards easier. AFTER the incident is the time to go over
what was done, think about how it was dealt with, consider any changes
in emergency process that might have shortened the duration, etc.

In fact the "What could we have done differently that would have
shortened the duration of the outage" question is pretty important.  The
answer might be "nothing", and that is ok, too, but the question should
be asked.

More information about the NANOG mailing list