Mitigating human error in the SP

Mark Smith nanog at 85d5b20a518b8f6864949bd940457dc124746ddc.nosense.org
Tue Feb 2 13:16:29 UTC 2010


On Mon, 1 Feb 2010 21:21:52 -0500
Chadwick Sorrell <mirotrem at gmail.com> wrote:

> Hello NANOG,
> 
> Long time listener, first time caller.
> 
> A recent organizational change at my company has put someone in charge
> who is determined to make things perfect.  We are a service provider,
> not an enterprise company, and our business is doing provisioning work
> during the day.  We recently experienced an outage when an engineer,
> troubleshooting a failed turn-up, changed the ethertype on the wrong
> port losing both management and customer data on said device.  This
> isn't a common occurrence, and the engineer in question has a pristine
> track record.
> 

Why didn't the customer have a backup link if their service was so
important to them and indirectly your upper management? If your
upper management are taking this problem that seriously, then your
*sales people* didn't do their job properly - they should be ensuring
that customers with high availability requirements have a backup link,
or aren't led to believe that the single-point-of-failure service will
be highly available.


> This outage, of a high profile customer, triggered upper management to
> react by calling a meeting just days after.  Put bluntly, we've been
> told "Human errors are unacceptable, and they will be completely
> eliminated.  One is too many."
> 

If upper management don't understand that human error is a risk factor
that can't be completely eliminated, then I suggest "self-eliminating"
and find yourself a job somewhere else. The only way you'll avoid
human error having any impact on production services is to not change
anything - which pretty much means not having a job anyway ...


> I am asking the respectable NANOG engineers....
> 
> What measures have you taken to mitigate human mistakes?
> 
> Have they been successful?
> 
> Any other comments on the subject would be appreciated, we would like
> to come to our next meeting armed and dangerous.
> 
> Thanks!
> Chad
> 




More information about the NANOG mailing list