fixing TCP buffers (Re: packet reordering at exchange points)

E.B. Dreger eddy+public+spam at noc.everquick.net
Wed Apr 10 00:57:19 UTC 2002


> Date: Tue, 9 Apr 2002 20:39:34 -0400
> From: Richard A Steenbergen <ras at e-gerbil.net>


> My suggestion was to cut out all that non-sense by simply removing the 
> received window limits all together. Actually you could accomplish this 
> goal by just advertising the maximum possible window size and rely on 
> packet drops to shrink the congestion window on the sending side as 
> necessary, but this would be slightly less efficient in the case of a 
> sender overrunning the receiver.
> 
> But alas we're both forgetting the sender side, which controls how quickly 
> data moves from userland into the kernel. This part must be set by looking 
> at the sending congestion window. And I thought of another problem as 

Actually, I was thinking more in terms of sending than receiving.
Yes, your approach sounds quite slick for the RECV side, and I
see your point.  But WND info will be negotiated for sending...
so why not base it on "splitting the total pie" instead of
"arbitrary maximum"?


> well. If you had a receiver which made a connection, requested as much 
> data as possible, and then never did a read() on the socket buffer, all 
> the data would pile up in the kernel and consume the total buffer space 
> for the entire system.

Unless, again, there's some sort of limit.  32 MB total, 512
connections, each socket gets 64 kB until it proves its worth.
Sockets don't get to play the RED-ish game until they _prove_
that they're serious about sucking down data.

Once a socket proves its intentions (and periodically after
that), it gets to use a BIG buffer, so we find out just how fast
the connection can go.


> You're missing the point, you don't allocate ANYTHING until you have a
> packet to fill that buffer, and then when you're done buffering it, it is
> free'd. The limits are just there to prevent you from running away with a 
> socket buffer.

No, I understand your point perfectly, and that's how it's
currently done.

But why even bother with constant malloc(9)/free(9) when the
overall buffer size remains reasonably constant?  i.e., kernel
allocation to IP stack changes slowly if at all.  IP stack alloc
to individual streams changes regularly.


--
Eddy

Brotsman & Dreger, Inc. - EverQuick Internet Division
Phone: +1 (316) 794-8922 Wichita/(Inter)national
Phone: +1 (785) 865-5885 Lawrence

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 21 May 2001 11:23:58 +0000 (GMT)
From: A Trap <blacklist at brics.com>
To: blacklist at brics.com
Subject: Please ignore this portion of my mail signature.

These last few lines are a trap for address-harvesting spambots.
Do NOT send mail to <blacklist at brics.com>, or you are likely to
be blocked.




More information about the NANOG mailing list