95th Percentile again (was RE: C&W Peering Problem?)

Greg A. Woods woods at weird.com
Sun Jun 3 18:19:23 UTC 2001


[ On Monday, June 4, 2001 at 00:21:31 (+1000), Geoff Huston wrote: ]
> Subject: Re: 95th Percentile again (was RE: C&W Peering Problem?)
>
> No its not obvious. The SNMP byte counters are odometers - as long as you 
> get two clean samples per counter wrap you can accurately count bytes. The 
> trick is to ensure that you get a minimum of two clean samples of the 
> odometer reading per counter wrap - for high speed interfaces that 
> typically implies reading the MIB2 64 bit interface counters, or triggering 
> an SNMP poll at relatively tight time intervals.

The worst problem with using SNMP counters is not the wrap-around
(properly implemented that happens "rarely" even on high-speed links
since the `standard' does `mandate' use of 64-bit counters for truly
high-speed links), but rather accidental resets caused by improper agent
implementations, or reboots (or both).  You have to detect not only
counter roll-over, but also resets, and you can only do the latter if
the agent's uptime value is also reset when the counters are reset.
Otherwise you have to do what MRTG and recent versions of cricket do and
simply ignore all roll-over and reset events (and thus take the loss on
the counter deltas for those intervals).

Which is why taking measurements of even MIB-2 64-bit counters very
frequently (eg. even as often as every five minutes) is "wise" to do
even if you're simply billing on bulk throughput per period.

It's not very hard to scale a collection engine that can run in parallel
(on parallel hardware if necessary) to do this, and indeed the data
volume should not be an issue even at a one-minute collection interval!

Another problem I've seen is with SNMP agents that can't scale to handle
a full compliment of ports on their host routers/switches.  This is an
important consideration to keep in mind when choosing a hardware vendor.
I think this is still an area that needs covering by an independent test
lab too....

> (My previous comments a month or so back about the inaccuracies inherant in 
> 95% systems still apply - given a particular (extreme case) traffic load 
> pattern it is possible for two measurement systems that are not phase 
> locked, using precisely the same sampling technique and computation to 
> deliver outcome values for the 95% point where one is up to twice the value 
> of the other. )

Well, IIRC, your example was one of true extremes in the "coarse"
variety, and one in which any ISP (or customer, if it's the other way
around) who's paying attention will spot and nix immediately (because
they're well aware of the wicked ways of the world and will clearly have
anticipated them in their contracts).  I.e. you can't play games with
the system because you can't be a customer if you do!  ;-)

(unless maybe all your customers play the same game and you mandate that
they play "in sync" with each other thus guaranteeing your own
utilisation is flat....  :-)

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods at acm.org>     <woods at robohack.ca>
Planix, Inc. <woods at planix.com>;   Secrets of the Weird <woods at weird.com>



More information about the NANOG mailing list