TCP and WAN issue

Wed Mar 28 13:23:06 UTC 2007

Andre Oppermann gave the best advice so far IMHO.
I'll add a few points.

> To quickly sum up the facts and to dispell some misinformation:

>  - TCP is limited the delay bandwidth product and the socket buffer
>    sizes.

Hm... what about: The TCP socket buffer size limits the achievable
throughput-RTT product? :-)

>  - for a T3 with 70ms your socket buffer on both endss should be
>    450-512KB.

Right.  (Victor Reijs' "goodput calculator" says 378kB.)

>  - TCP is also limited by the round trip time (RTT).

This was stated before, wasn't it?

>  - if your application is working in a request/reply model no amount
>    of bandwidth will make a difference.  The performance is then
>    entirely dominated by the RTT.  The only solution would be to run
>    multiple sessions in parallel to fill the available bandwidth.

Very good point.  Also, some applications have internal window
limitations.  Notably SSH, which has become quite popular as a bulk
data transfer method.  See http://kb.pert.geant2.net/PERTKB/SecureShell

>  - Jumbo Frames have definately zero impact on your case as they
>    don't change any of the limiting parameters and don't make TCP go
>    faster.

Right.  Jumbo frames have these potential benefits for bulk transfer:

(1) They reduce the forwarding/interrupt overhead in routers and hosts
by reducing the number of packets.  But in your situation it is quite
unlikely that the packet rate is a bottleneck.  Modern routers
typically forward even small packets at line rate, and modern
hosts/OSes/Ethernet adapters have mechanisms such as "interrupt
coalescence" and "large send offload" that make the packet size
largely irrelevant.  But even without these mechanisms and with
1500-byte packets, 45 Mb/s shouldn't be a problem for hosts built in
the last ten years, provided they aren't (very) busy with other
processing.

(2) As Perry Lorier pointed out, jumbo frames accelerate the "additive
increase" phases of TCP, so you reach full speed faster both at
startup and when recovering from congestion.  This may be noticeable
when there is competition on the path, or when you have many smaller
transfers such that ramp-up time is an issue.

(3) Large frames reduce header overhead somewhat.  But the improvement
going from 1500-byte to 9000-bytes packets is only 2-3%, from ~97%
efficiency to ~99.5%.  No orders of magnitude here.

>    There are certain very high-speed and LAN (<5ms) case where it
>    may make a difference but not here.

Cases where jumbo frames might make a difference: When the network
path or the hosts are pps-limited (in the >Gb/s range with modern
hosts); when you compete with other traffic.  I don't see a relation
with RTTs - why do you think this is more important on <5ms LANs?

>  - Your problem is not machine or network speed, only tuning.

Probably yes, but it's not clear what is actually happening.  As it
often happens, the problem is described with very little detail, so
experts (and "experts" :-) have a lot of room to speculate.

This was the original problem description from Philip Lavine:

    I have an east coast and west coast data center connected with a
    DS3. I am running into issues with streaming data via TCP

In the meantime, Philip gave more information, about the throughput he
is seeing (no mention how this is measured, whether it is total load
on the DS3, throughput for an application/transaction or whatever):

    This is the exact issue. I can only get between 5-7 Mbps.

And about the protocols he is using:

    I have 2 data transmission scenarios:

    1. Microsoft MSMQ data using TCP
    2. "Streaming" market data stock quotes transmitted via a TCP
       sockets

It seems quite likely that these applications have their own
performance limits in high-RTT situations.

Philip, you could try a memory-to-memory-test first, to check whether
TCP is really the limiting factor.  You could use the TCP tests of
iperf, ttcp or netperf, or simply FTP a large-but-not-too-large file
to /dev/null multiple times (so that it is cached and you don't
measure the speed of your disks).

If you find that this, too, gives you only 5-7 Mb/s, then you should
look at tuning TCP according to Andre's excellent suggestions quoted
below, and check for duplex mismatches and other sources of
transmission errors.

If you find that the TCP memory-to-memory-test gives you close to DS3
throughput (modulo overhead), then maybe your applications limit
throughput over long-RTT paths, and you have to look for tuning
opportunities on that level.

> Change these settings on both ends and reboot once to get better throughput:

> [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
> "SackOpts"=dword:0x1 (enable SACK)
> "TcpWindowSize"=dword:0x7D000 (512000 Bytes)
> "Tcp1323Opts"=dword:0x3 (enable window scaling and timestamps)
> "GlobalMaxTcpWindowSize"=dword:0x7D000 (512000 Bytes)

> http://www.microsoft.com/technet/network/deploy/depovg/tcpip2k.mspx
-- 
Simon.