Mitigating human error in the SP
wavetossed at googlemail.com
Tue Feb 2 19:36:59 CST 2010
> The actual error happened when someone was troubleshooting a turn-up,
> where in the past the customer in question has had their ethertype set
> wrong. It wasn't a provisioning problem as much as someone
> troubleshooting why it didn't come up with the customer. Ironically,
> the NOC was on the phone when it happened, and the switch was rebooted
> almost immediately and the outage lasted 5 minutes.
This is why large operators have a "ready for service" protocol. The customer
is never billed until it is officially RFS, and to make it RFS requires more
than an operational network, it also requires the customer to agree in writing
that they have a fully functional connection.
This is another way of hiding human error, because now the up-down-up is
just part of the provisioning process. There is a record of the RFS date-time
so if the customer complains about an outage BEFORE that point, they can
be politely reminded that when RFS happened and that charging does not
start until AFTER that point.
More information about the NANOG