buffer bloat and packet pacing

Brett Frankenberger rbf+nanog at panix.com
Thu Sep 3 14:36:38 UTC 2015


On Thu, Sep 03, 2015 at 01:04:34PM +0100, Nick Hilliard wrote:
> On 03/09/2015 11:56, Saku Ytti wrote:
> > 40GE server will flood the window as fast as it can, instead of
> > limiting itself to 10Gbps, optimally it'll send at linerate.
> 
> optimally, but tcp slow start will generally stop this from happening on
> well behaved sending-side stacks so you send up ramping up quickly to path
> rate rather than egress line rate from the sender side.  Also, regardless
> of an individual flow's buffering requirements, the intermediate path will
> be catering with large numbers of flows, so while it's interesting to talk
> about 375mb of intermediate path buffers, this is shared buffer space and
> any attempt on the part of an individual sender to (ab)use the entire path
> buffer will end up causing RED/WRED for everyone else.
> 
> Otherwise, this would be a fascinating talk if people had real world data.

The original analysis is flawed because it assumes latency is constant.
Any analysis has to include the fact that buffering changes latency.

If you start with a 300ms path (by propogation delay, switching latency,
hetc.), and 375MB of buffers on a 10G port, then, when the buffers
fill, you end up with a 600ms path[1].  And a 375MB window is no longer
sufficient to keep the pipe full.

Instead, you need a 750MB buffer.

But now the latency is 900ms.

And so on.  This doesn't converge.  Every byte of filled buffer is
another byte you need in the window if you're going to fill the pipe.

Not accounting for this is part of the reason the original analysis is
flawed.  The end result is that you always run out of window or run out
of buffer (causing packet loss).

Here's a paper that shows you don't need buffers equal to
bandwidth*delay to get near capacity:
http://www.cs.bu.edu/~matta/Papers/hstcp-globecom04.pdf
(I'm not endorsing it.  Just pointing out it out as a datapoint.)

     -- Brett

[1] 0.300 + 375E6 * 8 / 10E9 = 600ms



More information about the NANOG mailing list