How do you (not how do I) calculate 95th percentile?
drussell at thrupoint.net
Wed Feb 22 22:46:01 UTC 2006
I think that we have two (partially) unrelated issues in this thread: 1) how often you should sample and 2) what do you do with the results.
I personally think that 5 minute sampling is so last century because it is better suited for batch load types that do not change very quickly than for interactive web applications. If your users' web performance is being affected by a particular link, they are going to notice it in the 10 second range. Congestion events lasting 1-3 minutes can be a problem. After five minutes they have forgotten what they were doing:)
How often you check the counter should be driven by how granular you want to measure the network. Pick the right counter so that it does not wrap on you during your sampling interval.
The initial downside is that you have 10-30 times as much data. Network data has chaotic (aka self-similar) characteristics that make simple statistics such as max, min or average somewhat useless.
My understanding of the reason to calculate a 95th percentile is to try to reduce the dataset size and to make some sense out of the random performance data. For example, I could take some range of data and figure out the 95% threshold and save that as a data point. (eg. 95% of the samples are less than X Mbps).
Read the counter value, compute the rate for the interval, then compute the 95th % threshold for 20+ samples and save that as the value for that longer period.
The basic assumption is that you can ignore or not bill the 5% of the time that you had higher values. Its 6 minutes during a 10 hour business window or 15 minutes over a 24 hour period. One could argue that 95 should be 98 or 92 or it matters if the 5% is a continuous. But its a reasonable starting point for making a decision about whether link utilization is too high.
From: owner-nanog at merit.edu on behalf of Jo Rhett
Sent: Wed 2/22/2006 1:12 PM
To: nanog at merit.edu
Subject: How do you (not how do I) calculate 95th percentile?
I am wondering what other people are doing for 95th percentile calculations
these days. Not how you gather the data, but how often you check the
counter? Do you use averages or maximums over time periods to create the
buckets used for the 95th percentile calculation?
A lot of smaller folks check the counter every 5 min and use that same
value for the 95th percentile. Most of us larger folks need to check more
often to prevent 32bit counters from rolling over too often. Are you larger
folks averaging the retrieved values over a larger period? Using the
maximum within a larger period? Or just using your saved values?
This is curiosity only. A few years ago we compared the same data and the
answers varied wildly. It would appear from my latest check that it is
becoming more standardized on 5-minute averages, so I'm asking here on Nanog
as a reality check.
Note: I have AboveNet, Savvis, Verio, etc calculations. I'm wondering
if there are any other odd combinations out there.
Reply to me offlist. If there is interest I'll summarize the results
without identifying the source.
SVcolo : Silicon Valley Colocation
Note: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. ThruPoint, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NANOG