outages, quality monitoring, trouble tickets, etc

Sean Donelan SEAN at SDG.DRA.COM
Tue Nov 28 23:43:42 UTC 1995

>From: kwe at 6SigmaNets.COM (Kent W. England)
>I'm skeptical of any end-to-end availability figures over 97%. I don't
>think they reflect the reality of leased line circuits today, or else they
>don't include the leaf node circuits and only report backbone availability.
>For a highly redundant backbone, almost any definition of availability
>should result in a number like 99.mumble%. Remember 99.9% availability
>means less than 9 hours outage per year. Routing hiccups take that much.
>One or two leased lines outages is all you get for 9 hours. The real world
>is a lot less available than that.

Thank you!  I thought I was living in a twilight zone with people
reporting 99.9% network availability.  This is the rathole end-to-end
network useablity.  The customer is interested in end-to-end useability.
While the network operator can only easily measure intra-network modules.

I can't tell you the answer, but there is definitely something happening
with customer perceptions of Internet useability.  Looking at the
numbers I would agree a single leased circuit should be less reliable
(single point of failure) than a highly redundant backbone.  But by
our customer perceptions, that isn't the case.  Either we have better
than "normal" leased circuits, or the highly redundant backbones aren't,
or our customers needs are based something we aren't directly measuring.

Highly redundant backbones remain extremely vunerable to the "glitch."
Human glitches, software glitches, "impossible" data glitches.  Redundant
backbones do protect against the backhoe "glitch."

>But since half the web servers I try to talk to refuse me half the time,
>I'm not sure that network availability per se (HWB's complaints duly
>acknowledged) is the tallest pole in the tent.

Part of this problem is the growing number of interdepencies (complexity,
chaos?).  Even if each individual module is working 99.9% of the time,
the probabilities start looking pretty bad when all need to be working
at the same time.  To make a web connection, you have a string of name
servers, a string of networks to the name servers, a string of routers
on those networks, another string of networks to the web server, another
string of routers, more strings of networks and routers and servers on
the return path.

I'm amazed it even works 50% of the time.  Unfortunately our customers
aren't always as understanding.

Since error reporting sucks in most network applications, it becomes
the fault of whatever help desk happens to take the customers phone call.
Sean Donelan, Data Research Associates, Inc, St. Louis, MO
  Affiliation given for identification not representation

More information about the NANOG mailing list