Ahoy, SLA boffins!

Wed Jul 29 12:54:49 UTC 2009

Bill,
To be brief, but hopefully not too fleeting, the majority of the
standards orgs - ITU, MEF -  use packet loss to derive availability.
Loss% = the % of packets which were transmitted but not received by the
destination host.  As for availability, loss is measured across some
time period.  If during that period X% of the transmitted  packets were
NOT lost, then the network is said to be available.  Typically a 20%
figure is used, e.g. if 20% of the packets transmitted during a 5-minute
period were received then the network is said to be 100% Available for
that 5-minute time period.  Some Carriers have taken this to the extreme
to say that if at least 1 packet was successfully transmitted then the
network was 100% Available for the time period.  

Loss is a measure of the networks usability, Availability is .......??
(Meaningless??)  What utility does a network have that is "Available"
yet sustaining a loss rate which renders it inoperable? 

Rich

-----Original Message-----
From: Bill Woodcock [mailto:woody at pch.net] 
Sent: Wednesday, July 29, 2009 12:34 AM
To: nanog
Subject: Ahoy, SLA boffins!

So I've embarked on the no-doubt-futile task of trying to interpret  
SLAs as empirically-verifiable technical specifications, rather than  
as marketing blather.  And there's something that I'm finding  
particularly puzzling:

In most SLAs, there seem to be two separate guarantees proffered: one  
concerning "network availability" and one concerning "packet loss."   
Now, if I were to put my engineer hat on, and try to _imagine_ what  
the difference might be, I might imagine "network availability" to  
have something to do with layer-2 link status being presented as "up,"  
while packet loss would be the percentage of packets dropped.  But  
when I actually read SLAs, "network availability" is generally defined  
as the portion of the month that the path from the customer's local  
loop to the transit or peering routers was "available" to transmit  
packets.  Packet loss, on the other hand, is generally defined as the  
portion of packets which are lost while crossing that exact same piece  
of network.

Now, what am I missing here?  Is this one of those Heisenberg things,  
where "network availability" is the time the network _could have_  
delivered a packet _when you weren't actually doing so_, while "packet  
loss" is the time the network _couldn't_ deliver a packet when you  
_were_ actually doing so?

Is "network availability" inherently unmeasurable on a network that's  
less than 100% utilized?

Am I over-thinking this?

Seriously, though, I know there are people who don't consider SLAs to  
be fantasy-fiction, and some of them must not be innumerate, and some  
subset of those must be on NANOG, and the intersection set might be  
equal to or greater than one, right?  Can anybody explain this to me  
in a way I can translate into code, while still taking myself seriously?

                                 -Bill