Mitigating human error in the SP

Larry Sheldon LarrySheldon at
Tue Feb 2 15:44:08 UTC 2010

On 2/2/2010 6:26 AM, gb10hkzo-nanog at wrote:
>>> Otherwise, as Suresh notes, the only way to eliminate human error
>>> completely is to eliminate the presence of humans in the
>>> activity.
> and,hence by reference.....
>>> Automated config deployment / provisioning.
> That's the funniest thing I've read all day... ;-)
> A little pessimistic rant.... ;-)
> Who writes the scripts that you use, who writes the software that you
> use ?    There will always be at least one human somewhere, and where
> there's a human writing software tools, there's scope for bugs and
> unexpected issues.  Whether inadvertent or not, they will always be
> there.
> If the excrement is going to hit the proverbial fan, try as you might
> to stop it, it will happen.  Nothing in the IT / ISP / Telco world is
> ever going to be perfect, far too complex with many dependencies.
> Yes you might play in your perfect little labs until the cows come
> home ..... but there always has been and always will be an element of
> risk when you start making changes in production.
> Face it, unless you follow the rigorous change control and
> development practices that they use for avionics or other high-risk
> environments, you are always going to be left with some element of
> risk.
> How much risk your company is prepared to take is something for the
> men in black (suits) to decide because it correlates directly with
> how much $$$ they are prepared to throw your way to help you mitigate
> the risk .....;-)
> That's my 2<insert_currency>  over ...... thanks for listening (or
> not !).... ;-)

Add to that the stuff that always sounds like a cop-out, even tom the 
victims--the "human error" made by people not on you payroll, the 
vendors that are responsible for the misleading (or absent) 
documentation, for the CLI stuff that doesn't work just the way a 
reasonable person would expect it too, for the hardware that fails 
dirty, and on and on--a very long list.  Exacerbated by management that 
cheaps out on equipment, software, documentation, training, and staff.

Even with a lab with a rich fabric of equipment, there will be most of 
the other things to contend with.

A reasonable and competent management will not only provide what is 
needed for a reasonable error rate (which indeed can approach one over 5 
nines) but will also provide the means of recovery when the inevitable 
happens.  That might involve "needless" expense like additional staff, 
redundant equipment, alternate paths, ...

But it won't involve whippings until the morale improves or reductions 
in staff and funding until the errors go away.

"Government big enough to supply everything you need is big enough to
take everything you have."

Remember:  The Ark was built by amateurs, the Titanic by professionals.

Requiescas in pace o email
Ex turpi causa non oritur actio
Eppure si rinfresca

ICBM Targeting Information:

More information about the NANOG mailing list