923Mbits/s across the ocean

Richard A Steenbergen ras at e-gerbil.net
Tue Mar 11 00:35:36 UTC 2003


On Tue, Mar 11, 2003 at 12:41:15AM +0100, Iljitsch van Beijnum wrote:
> > On the receive size, the socket buffers must be large enough to
> > accommodate all the data received between application read()'s,
> 
> That's not true. It's perfectly acceptable for TCP to stall when the
> receiving application fails to read the data fast enough. (TCP then
> simply announces a window of 0 to the other side so the communication
> effectively stops until the application reads some data and a >0 window
> is announced.) If not, the kernel would be required to buffer unlimited
> amounts of data in the event an application fails to read it from the
> buffer for some time (which is a very common situation).

Ok, I think I was unclear. You don't NEED to have buffers large enough to
accommodate all that data received between application read()'s, unless
you are trying to achieve maximum performance. I thought that was the
general framework we were all working under. :)

> > locally. Jumbo frames help too, but their real benefit is not the
> > simplistic "hey look theres 1/3rd the number of frames/sec" view that many
> > people see. The good stuff comes from techniques like page flipping, where
> > the NIC DMA's data into a memory page which can be flipped through the
> > system straight to the application, without copying it throughout. Some
> > day TCP may just be implemented on the NIC itself, with ALL work
> > offloaded, and the system doing nothing but receiving nice page-sized
> > chunks of data at high rates of speed.
> 
> Hm, I don't see this happening to a usable degree as TCP has no concept
> of records. You really want to use fixed size chunks of information here
> rather than pretending everything's a stream.

We're talking optimizations for high performance transfers... It can't 
always be a stream.

> > IMHO the 1500 byte MTU of ethernet
> > will still continue to prevent good end to end performance like this for a
> > long time to come. But alas, I digress...
> 
> Don't we all? I'm afraid you're right. Anyone up for modifying IPv6 ND
> to support a per-neighbor MTU? This should make backward-compatible
> adoption of jumboframes a possibility. (Maybe retrofit ND into v4 while
> we're at it.)

Not necessarily sure thats the right thing to do, but SOMETHIG has got to 
be better than what passes for path mtu discovery now. :)

-- 
Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)



More information about the NANOG mailing list