help needed - state of california needs a benchmark - beware bufferbloat

Mon Jan 31 14:58:53 UTC 2011

On 01/29/2011 01:00 PM, Mike wrote:
> Hello,
>
> My company is small clec / broadband provider serving rural communities
> in northern California, and we are the recipient of a small grant from
> the state thru our public utilities commission. We went out to 'middle
> of nowhere' and deployed adsl2+ in fact (chalk one up for the good
> guys!), and now that we're done, our state puc wants to gather
> performance data to evaluate the result of our project and ensure we
> delivered what we said we were going to. Bigger picture, our state is
> actively attempting to map broadband availability and service levels
> available and this data will factor into this overall picture, to be
> used for future grant/loan programs and other support mechanisms, so
> this really is going to touch every provider who serves end users in the
> state.
>
> The rub is, that they want to legislate that web based 'speedtest.com'
> is the ONLY and MOST AUTHORITATIVE metric that trumps all other
> considerations and that the provider is %100 at fault and responsible
> for making fraudulent claims if speedtest.com doesn't agree. No
> discussion is allowed or permitted about sync rates, packet loss,
> internet congestion, provider route diversity, end user computer
> performance problems, far end congestion issues, far end server issues
> or cpu loading, latency/rtt, or the like. They are going to decide that
> the quality of any provider service, is solely and exclusively resting
> on the numbers returned from 'speedtest.com' alone, period.
>
> All of you in this audience, I think, probably immediately understand
> the various problems with such an assertion. Its one of these situations
> where - to the uninitiated - it SEEMS LIKE this is the right way to do
> this, and it SEEMS LIKE there's some validity to whats going on - but in
> practice, we engineering types know it's a far different animal and
> should not be used for real live benchmarking of any kind where there is
> a demand for statistical validity.
>
> My feeling is that - if there is a need for the state to do
> benchmarking, then it outta be using statistically significant
> methodologies for same along the same lines as any other benchmark or
> test done by other government agencies and national standards bodies
> that are reproducible and dependable. The question is, as a hotbutton
> issue, how do we go about getting 'the message' across, how do we go
> about engineering something that could be considered statistically
> relevant, and most importantly, how do we get this to be accepted by
> non-technical legislators and regulators?

Mike,

For general tests of most things an ISP does, ICSI's netalyzr tests 
can't be beat.

http://netalyzr.icsi.berkeley.edu/

There are also tests at m-lab that may be useful: 
http://www.measurementlab.net/

As in all pieces of software, these may have bugs; netalyzr was under 
detecting bufferbloat on high bandwidth links until recently; this 
should be fixed now, I hope.

And SamKnows is doing the FCC broadband tests.

The speedtest.net tests (and pingtest.net) are good as far as they go 
(and you can host them someplace yourself; as others have noted, having 
and endpoint at someplace you control is wise); but they don't tell the 
whole story: they miss a vital issue that has been hidden.

Here's the rub:

Most tests have focussed on bandwidth (now misnamed "speed" by 
marketing, which it isn't).

Some tests have tested latency.

But there have been precious few that test latency under load, which is 
how we've gotten into a world of hurt on broadband over the last decade, 
where we now have a situation where a large fraction of broadband has 
latencies under load measured in *seconds*. (See: 
http://gettys.wordpress.com/ and bufferbloat.net).  These both make for 
fuming retail customers, as well as lots of service calls (I know, I 
generated quite a few myself over the years).  This is a killer for lots 
of applications, VOIP, teleconferencing, gaming, remote desktop hosting, 
etc.

Netalyzr tries to test for excessive buffering, as does at least one of 
the mlabs tests.

Dave Clark and I have been talking to SamKnows and Ookla to try to get 
latency under load tests added to the mix.  I think we've been having 
some traction at getting such tests added, but it's slightly too soon to 
tell.

We also need tests to identify ISP's failing to run queue management 
internal to their networks, as there is both research and anecdotal data 
that shows that that is also much more common than it should be. Some 
ISP's do a wonderful job, and others don't; Van Jacobson believes this 
is because Sally Floyd and his classic RED algorithm is buggy, and 
tuning it has scared many operators off; I believe his explanation.

So far, so bad.

Then there is the home router/host disaster:

As soon as you move the bottleneck link from the broadband hop to the 
802.11 link usually beyond it these days (by higher broadband bandwidth, 
or by having several chimneys in your house as I do, dropping the 
wireless bandwidth), you run into the fact that home routers and even 
our operating systems sometimes have even worse buffering than the 
broadband gear, sometimes measured in hundreds or even thousands of 
*packets*.

We're going to need to fix the home routers and user's operating 
systems.  For the 802.11 case, this is hard; Van says RED won't hack it, 
and we need better algorithms, whether Van's unpublished nRED algorithm 
or Doug Leith's recent work.

So you need to ensure the regulators' understand that doing testing 
carefully enough to know what you are looking at is hard.  Tests not 
done directly at the broadband gear may mix this problem with the 
broadband connection.

This is not to say tests should not be done: we're not going to get this 
swamp drained without the full light of day on the issue; just that 
current speedtest.net tests misses this entire issue right now (though 
may detect it in the future), and that the tests (today) aren't 
something you "just run" and get a simple answer, since the problem can 
be anywhere in a path.

Maybe there will be tests that "do the right thing" for regulators in a 
year or two; but not now: the tests today don't identify which link is 
at fault, and that the problem can easily be entirely inside the 
customer's house, if the test tests for bufferbloat at all.

I think it very important we get tests together that not only detect 
bufferbloat (which is very easy to detect, once you know how), but also 
point to where in the network the problem is occurring, to reduce the 
rate of complaints to something manageable, where everyone isn't having 
to field calls for problems they aren't responsible for (and unable to fix).

You can look at a talk about bufferbloat I gave recently at:
http://mirrors.bufferbloat.net/Talks/BellLabs01192011/

Let me know if I can be of help. People who want to help the bufferbloat 
problem please also note we recently opened a bufferbloat.net web site 
to help collaboration on this problem.

			Best regards,
				Jim Gettys
				Bell Labs

On 01/06/2011 01:50 PM, Van Jacobson wrote:
 > Jim,
 >
 > Here's the Doug Leith paper I mentioned. As I said on the phone I
 > think there's an easier, more robust way to accomplish the same
 > thing but they have running code and I don't. You can get their
 > mad-wifi implementation at
 > http://www.hamilton.ie/tianji_li/buffersizing.html
 >
 >   - van