Mitigating human error in the SP
mirotrem at gmail.com
Tue Feb 2 19:28:44 CST 2010
Thanks for all the comments!
On Tue, Feb 2, 2010 at 1:01 PM, JC Dill <jcdill.lists at gmail.com> wrote:
> Chadwick Sorrell wrote:
>> This outage, of a high profile customer, triggered upper management to
>> react by calling a meeting just days after. Put bluntly, we've been
>> told "Human errors are unacceptable, and they will be completely
>> eliminated. One is too many."
> Good, Fast, Cheap - pick any two. No you can't have all three.
> Here, Good is defined by your pointy-haired bosses as an
> impossible-to-achieve zero error rate. Attempting to achieve this is
> either going to cost $$$, or your operations speed (how long it takes people
> to do things) is going to drop like a rock. Your first action should be to
> make sure upper management understands this so they can set the appropriate
> priorities on Good, Fast, and Cheap, and make the appropriate budget
> It's going to cost $$$ to hire enough people to have the staff necessary to
> double-check things in a timely manner, OR things are going to slow way down
> as the existing staff is burdened by necessary double-checking of everything
> and triple-checking of some things required to try to achieve a zero error
> rate. They will also need to spend $$$ on software (to automate as much as
> possible) and testing equipment. They will also never actually achieve a
> zero error rate as this is an impossible task that no organization has ever
> achieved, no matter how much emphasis or money they pour into it (e.g.
> Windows vulnerabilities) or how important (see Challenger, Columbia, and the
> Mars Climate Orbiter incidents).
> When you put a $$$ cost on trying to achieve a zero error rate,
> pointy-haired bosses are usually willing to accept a normal error rate. Of
> course, they want you to try to avoid errors, and there are a lot of simple
> steps you can take in that effort (basic checklists, automation, testing)
> which have been mentioned elsewhere in this thread that will cost some money
> but not the $$$ that is required to try to achieve a zero error rate. Make
> sure they understand that the budget they allocate for these changes will be
> strongly correlated to how Good (zero error rate) and Fast (quick
> operational responses to turn-ups and problems) the outcome of this
>  http://www.godlessgeeks.com/LINKS/DilbertQuotes.htm
> 2. "What I need is a list of specific unknown problems we will encounter."
> (Lykes Lines Shipping)
> 6. "Doing it right is no excuse for not meeting the schedule." (R&D
> Supervisor, Minnesota Mining & Manufacturing/3M Corp.)
More information about the NANOG