Shady areas of TCP window autotuning?

Tue Mar 17 08:47:39 UTC 2009

On Mon, Mar 16, 2009 at 09:09:35AM -0500, Leo Bicknell wrote:
> Many edge devices have queues that are way too large.
> 
> What appears to happen is vendors don't auto-size queues.  Something
> like a cable or DSL modem may be designed for a maximum speed of
> 10Mbps, and the vendor sizes the queue appropriately.  The service
> provider then deploys the device at 2.5Mbps, which means roughly
> (as it can be more complex) the queue should be 1/4th the size.
> However the software doesn't auto-size the buffer to the link speed,
> and the operator doesn't adjust the buffer size in their config.
> 
> The result is that if the vendor targeted 100ms of buffer you now
> have 400ms of buffer, and really bad lag.

This is a very good point. Let me add, that it happens also for every
autosensing 10/100/1000Base-T ethernet port, which typically does not
auto-reduce buffers when the actual negotiated speed is not 1 Gbps.

> As network operators we have to get out of the mind set that "packet
> drops are bad".  While that may be true in planning the backbone
> to have sufficient bandwidth, it's the exact opposite of true when
> managing congestion at the edge.  Reducing the buffer to be ~50ms
> of bandwidth makes the users a lot happier, and allows TCP to work.
> TCP needs drops to manage to the right speed.
> 
> My wish is for the vendors to step up.  I would love to be able to
> configure my router/cable modem/dsl box with "queue-size 50ms" and
> have it compute, for the current link speed, 50ms of buffer.

Reducing buffers to 50 msec clearly avoids excessive queueing delays,
but let's look at this from the wider perspective:

1) initially we had a system where hosts were using fixed 64 kB buffers
This was unable to achieve good performance over high BDP paths

2) OS maintainers have fixed this by means of buffer autotuning, where
the host buffer size is no longer the problem. 

3) the above fix introduces unacceptable delays into networks and users
are complaining, especially if autotuning approach #2 is used

4) network operators will fix the problem by reducing buffers to e.g. 50 msec

So at the end of the day, we'll again have a system which is unable to
achieve good performance over high BDP paths, since with reduced buffers
we'll have an underbuffered bottleneck in the path which will prevent full
link untilization if RTT>50 msec. Thus all the above exercises will end up
in having almost the same situation as before (of course YMMV). 

Something is seriously wrong, isn't it?

And yes, I opened this topic last week on Linux netdev mailinglist and tried
hard to persuade those people that some less aggresive approach is probably
necessary to achieve good balance between the requirements for fastest
possible throughput and fairness in the network. But the maintainers simply
didn't want to listen :-(

        M.