Mitigating human error in the SP

Chadwick Sorrell mirotrem at gmail.com
Tue Feb 2 02:21:52 UTC 2010


Hello NANOG,

Long time listener, first time caller.

A recent organizational change at my company has put someone in charge
who is determined to make things perfect.  We are a service provider,
not an enterprise company, and our business is doing provisioning work
during the day.  We recently experienced an outage when an engineer,
troubleshooting a failed turn-up, changed the ethertype on the wrong
port losing both management and customer data on said device.  This
isn't a common occurrence, and the engineer in question has a pristine
track record.

This outage, of a high profile customer, triggered upper management to
react by calling a meeting just days after.  Put bluntly, we've been
told "Human errors are unacceptable, and they will be completely
eliminated.  One is too many."

I am asking the respectable NANOG engineers....

What measures have you taken to mitigate human mistakes?

Have they been successful?

Any other comments on the subject would be appreciated, we would like
to come to our next meeting armed and dangerous.

Thanks!
Chad




More information about the NANOG mailing list