Human Factors and Accident reduction/mitigation

Anton Kapela tkapela at gmail.com
Sun Nov 8 16:26:40 UTC 2009


Owen,

> We could learn a lot about this from Aviation.  Nowhere in human history has
> more research, care, training, and discipline been applied to accident
> prevention,
> mitigation, and analysis as in aviation.  A few examples:

Others later in this thread duly noted a definite relationship of
costs associated, which are clearly "worth it" given the particular
application of these methods [snipped]. However, I assert this is
warranted because of the specific public trust that commercial
aviation must be given. Additionally, this form of professional or
industry "standard" isn't unique in the world; you can find (albeit
small) parallels in most states' PE certification tracks and the like.

In the case of the big-I internet, I assert we can't (yet)
successfully argue that it's deserving of similar public trust. In
short, I'm arguing that big-I internet deserves special-pleading
status in these sorts of "instrument -> record -> improve" strawmen
and that we shouldn't apply similar concepts or regulation.

(Robert B. then responded):

> All,
> The real problem is same human factors we have in aviation which cause most
> accidents. Look at the list below and replace the word Pilot with Network
> Engineer or Support Tech or Programmer or whatever... and think about all
> the problems where something didn't work out right. It's because someone
> circumvented the rules, processes, and cross checks put in place to prevent
> the problem in the first place. Nothing can be made idiot proof because
> idiots are so creative.

I'd like to suggest we also swap "bug" for "software defect" or
"hardware defect" - perhaps if operators started talking about
problems like engineers, we'd get more global buy-in for a
process-based solution.

I certainly like the idea of improving the state of affairs where
possible - especially the operator->device direction (i.e
fat-fingering acl, prefix list, community list, etc). When people make
mistakes, it seems very wise to accurately record the entrance
criteria, the results of their actions, and ways to avoid it - then
shared to all operators (like at NANOG meetings!). The part I don't
like is being ultimately responsible for, or to "design around" a
class of systemic problems which are entirely outside of an operators
sphere of control.

What curve must we shift to get routers with hardware and software
that's both a) fast b) reliable and c) cheap -- in the hopes that the
only problems left to solve indeed are human ones?

-Tk




More information about the NANOG mailing list