Jumbo frame Question

Fri Nov 26 01:06:21 UTC 2010

On Thu, Nov 25, 2010 at 4:26 PM, Kevin Oberman <oberman at es.net> wrote:
>> From: Harris Hui <harris.hui at hk1.ibm.com>
>> Date: Fri, 26 Nov 2010 08:13:57 +0800
>>
>> Hi
>>
>> Does anyone have experience on design / implementing the Jumbo frame
>> enabled network?
>>
>> I am working on a project to better utilize a fiber link across east coast
>> and west coast with the Juniper devices.
>>
>> Based on the default TCP windows in Linux / Windows and the latency between
>> east coast and west coast (~80ms) and the default MTU size 1500, the
>> maximum throughput of a single TCP session is around ~3Mbps but it is too
>> slow for us to backing-up the huge amount of data across 2 sites.
>>
>> The following is the topology that we are using right now.
>>
>> Host A NIC (MTU 9000) <--- GigLAN ---> (MTU 9216) Juniper EX4200 (MTU 9216)
>> <---GigLAN ---> (MTU 9018) J-6350 cluster A (MTU 9018) <--- fiber link
>> across site ---> (MTU 9018) J-6350 cluster B (MTU 9018) <--- GigLAN --->
>> (MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9000) NIC - Host
>> B
>>
>> I was trying to test the connectivity from Host A to the J-6350 cluster A
>> by using ICMP-Ping with size 8000 and DF bit set but it was failed to ping.
>>
>> Does anyone have experience on it? please advise.
>>
>> Thanks :-)
>
> MTU is only one issue. System tuning and a clean path are also
> critical. Getting good data streams between two systems that far apart
> is not easy, but with reasonable effort you can get 300 to 400 Mbps.
>
> If an 8000 byte ping fails, that says that SOMETHING is not jumbo
> enabled, but it's hard to tell what. This assumes that no firewall or
> other device is blocking ICMP, but I assume that 1400 byte pings
> work. Try hop-by-hop tests.
>
> I should also mention that some DWDM gear needs to be configured to
> handle jumbos. We've been bitten by that. You tend to assume that layer
> 1 gear won't care about layer 2 issues, but the input is an Ethernet
> interface.
>
> Finally, host tuning is critical. You talk about "default" window size",
> but modern stack auto-tune window size. For lots of information on
> tuning and congestion management, see http://fasterdata.es.net. We move
> terabytes of data between CERN and the US and have to make sure that the
> 10GE links run at close to capacity and streams of more than a Gbps will
> work. (It's not easy.)
> --
> R. Kevin Oberman, Network Engineer
> Energy Sciences Network (ESnet)
> Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
> E-mail: oberman at es.net                  Phone: +1 510 486-8634
> Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751

We move hundreds of TB around from one side of the planet to the
other on a regular basis.  Kevin's link has some really good resources
listed on it.  I can't stress enough the requirement for doing BOTH
OS-level kernel tuning (make sure that RFC1323 extensions are
enabled, make sure you have big enough maximum send and receive
buffers; if you OS does auto-tuning, make sure the maximum parameters
set are big enough to support all the data you'll want to have in flight at
any one time) AND application level adjustments.  One of the biggest
stumbling blocks we run across is people who have done their OS tuning,
but then try to use stock SSH/SCP for moving files around.  It doesn't
matter how much tuning you do in the OS if your application only has
a 1MB or 64KB buffer for data handling, you just won't get the throughput
you're looking for.

But with proper OS and application layer tuning, you can move a lot of
data even over stock 1500 byte frames; don't be distracted by jumboframes,
it's a red herring when it comes to actually moving large volumes of data
around.  (yes, yes, it's not completely irrelevant, for the pedants in the
audience--but it's not required by any means).

Matt