Ahoy, SLA boffins!

Holmes,David A dholmes at mwdh2o.com
Wed Jul 29 18:05:05 UTC 2009


We use the BRIX active measurement system (BRIX now owned by EXFO) which
gathers round trip time, packet loss, and jitter randomly every minute
24x7x365 for our major backbone links to calculate SLAs. "Network
Availability" can be measured empirically using BRIX calculated values
of packet loss, and expressed in terms of #9's, which BRIX will also
calculate over any time period for which BRIX historical data is being
kept. BRIX historical data is kept on an embedded Oracle data base. BRIX
usually runs on a Solaris SMP server.   

-----Original Message-----
From: Bill Woodcock [mailto:woody at pch.net] 
Sent: Tuesday, July 28, 2009 9:34 PM
To: nanog
Subject: Ahoy, SLA boffins!


So I've embarked on the no-doubt-futile task of trying to interpret SLAs
as empirically-verifiable technical specifications, rather than as
marketing blather.  And there's something that I'm finding particularly
puzzling:

In most SLAs, there seem to be two separate guarantees proffered: one  
concerning "network availability" and one concerning "packet loss."   
Now, if I were to put my engineer hat on, and try to _imagine_ what the
difference might be, I might imagine "network availability" to have
something to do with layer-2 link status being presented as "up,"  
while packet loss would be the percentage of packets dropped.  But when
I actually read SLAs, "network availability" is generally defined as the
portion of the month that the path from the customer's local loop to the
transit or peering routers was "available" to transmit packets.  Packet
loss, on the other hand, is generally defined as the portion of packets
which are lost while crossing that exact same piece of network.

Now, what am I missing here?  Is this one of those Heisenberg things,
where "network availability" is the time the network _could have_
delivered a packet _when you weren't actually doing so_, while "packet
loss" is the time the network _couldn't_ deliver a packet when you
_were_ actually doing so?

Is "network availability" inherently unmeasurable on a network that's
less than 100% utilized?

Am I over-thinking this?

Seriously, though, I know there are people who don't consider SLAs to be
fantasy-fiction, and some of them must not be innumerate, and some
subset of those must be on NANOG, and the intersection set might be
equal to or greater than one, right?  Can anybody explain this to me in
a way I can translate into code, while still taking myself seriously?

                                 -Bill








More information about the NANOG mailing list