95th Percentile again!

Sun Jun 3 17:55:36 UTC 2001

[ On Saturday, June 2, 2001 at 23:59:17 (-0700), David Schwartz wrote: ]
> Subject: RE: 95th Percentile again!
>
> 	I don't agree that this is so for 95th percentile. Exactly which five
> minute interval a packet is counted in will affect the results. There is no
> way to totally agree on which such interval a packet belongs in. Similarly,
> where the five-minute intervals begin and end is arbitrary and affects the
> final numbers.

Perhaps you should sit down with a table of numbers and compare the
results by hand.  I think you'll find that you are gravely mistaken.

(I can provide you with some raw numbers that are guaranteed to have
been sampled out-of-sync at the ends of the same pipe if you'd like.)

The only time there can ever be a descrepancy is at the "edge".  I.e. if
during the last sample time in the billing period the ISP sees a huge
count of bytes, but the customer (because his last full sample was five
minutes less one second before the end of the period) sees zero bytes,
*AND* iff this one large sample throws the Nth percentile calculation
for the entire billing period up over the next billing increment, then
the lack of syncronisation will cause a "problem" (for the customer in
this case :-).  However the chances of this kind of error happening in
real life are so tiny as to be almost impossible (at least if the
billing period is orders of magnitude larger than the sample period,
which of course is what we're supposing here).  I count over three
orders of magnitude difference for a 30-day billing period and a 5-min
sample period.

For the customer it's easy to avoid too -- just unplug your network
(scheduled down time) during the 10-minute period between billing cyle
roll-overs.  :-)

> 	The interface byte counters won't tell you where the packets went.

Clearly if the ISP is at one end of the pipe and the customer's at the
other then the out/in (and in/out at the other end) counters are an
extremely accurate count of where the packets went!

Obviously such a scheme "limits" in some ways the viable alternatives
for connecting customers, and it certainly forces you to do your data
collection at specific points.

> So any
> such billing scheme would be based ultimately upon statistical sampling.

Please try and talk sense man!  Regardless of what you're buying or
selling there's absolutely NOTHING "statistical" about byte counting!

It's pure accounting, plain and simple.  It's 100% auditable and
100% verifiable too!

> The
> provider would determine that typically some of your packets are local and
> cost very little and some are remote and may cost much more. Rather than
> counting each packet and figuring out its cost, the provider relies upon
> prior statistical sampling to come up with some 'average' cost which he
> bills you on the basis of.

The only way to do that is to count flows instead of bytes and the only
way I know of doing that is indeed based only on statistical sampling.

Any customer who'd be willing to suffer under such a scheme is either
not very clueful or getting one heck of a deal on their pricing....

> 	Sometimes what happens in this case is the customer or the provider realize
> that this particular traffic pattern does not match the statistical sample
> on which the billing was based. Richard Steenbergen told me a story about a
> company that colocated all their servers at POPs of the same provider and
> paid twice for traffic between their machines. Needless to say, they had to
> negotiate new pricing. Why? Because their traffic pattern made the
> statistical sampling upon which their billing was based inappropriate.

You're talking apples and oranges -- please stop mis-directing the topic
in an apparent attempt to "call the kettle black".

> 	If a billing scheme were not based upon statistical sampling, it would
> require the provider to somehow accurately determine how much each packet
> cost him to get to you or handoff from you and bill you based upon that on
> something like a cost plus basis.

Iff.  but that's not what we're talking about here.

> 	I agree, but all of the alternatives are ultimately based upon statistical
> sampling. NetFlow, for example, loses a certain percentage of the packets
> because it's UDP based. The provider compensates for this by raising his
> rates. If he expects 3% of his accounting records to be lost, he raises his
> rates to 103% hoping that he'll get a fair statistical sample. If this
> assumption is violated, for example if packets are more likely to drop at
> peak times and a particular customer passes most of their traffic at peak
> times, then the statistical assumptions upon which the billing is based will
> be violated, and the ISP will get taken advantage of.

Duh.  But this isn't what we're talking about.

> 	If he counts bytes out an Ethernet port, he'll be billing you for some
> broadcast traffic that costs him nothing. He'll be billing you for some
> local traffic that costs him nothing. He'll be billing you for some
> short-range traffic that costs him very little. But he uses statistical
> sampling to come up with some 'per byte' cost. If, for example, most of a
> particular customer's traffic is from another customer in the same POP,
> again the statistical assumptions upon which the billing is based will be
> violated, and the customer will likely have to negotiate some other billing
> mechanism.

I don't see the problem.  It's a very simple matter to adjust the
pricing to fit.  You can do some "statistical sampling" to set the
price, just like anyone might do in any form of cost estimation, but
what's on the invoice in the end is a pure accounting of the actual
traffic.  You can do the same for packet loss too.  It's only the
price/unit that's based on statistical sampling and cost estimates.  Why
is this so difficult for some people to understand?

> 	Every billing scheme I have ever seen has been based upon statistical
> sampling. The closest to an exception I've seen is Level3's distance-based
> scheme.

You've obviously never looked beyond the silly schemes you're apparently
stuck on talking about.  I know of many billing systems that are based
on pure bulk-throughput accounting and several that are based on true
Nth percentile usage.  None of them, not a single one, are based on
statistical samples of anything -- *ALL* are pure 100% byte-counting and
all of them count each and every byte.

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods at acm.org>     <woods at robohack.ca>
Planix, Inc. <woods at planix.com>;   Secrets of the Weird <woods at weird.com>