Thoughts on increasing MTUs on the internet

Douglas Otis dotis at mail-abuse.org
Sun Apr 15 03:46:32 UTC 2007


On Apr 14, 2007, at 1:10 PM, Iljitsch van Beijnum wrote:
> On 14-apr-2007, at 19:22, Douglas Otis wrote:
>>>
>>> 1500 byte MTUs in fact work. I'm all for 9K MTUs, and would  
>>> recommend them. I don't see the point of 65K MTUs.
>>
>> Keep in mind that a 9KB MTU still reduces the Ethernet CRC  
>> effectiveness by a fair amount.
>
> I can't find bit error rate specs for various types of ethernet  
> real quick, but if you assume 10^-9 that means that ~ 1 in 10000  
> 11454 byte packets has one bit error, so around 1 in 10^12 has four  
> bit errors and has a _chance_ to defeat the CRC32.  The naieve  
> assumption that only 1 in 2^32 of those packets with 3 flipped bits  
> will have a valid CRC32 is probably incorrect, but the CRC should  
> still catch most of those packetss for a fairly large value of "most".

http://www.ietf.org/rfc/rfc3385.txt
http://citeseer.ist.psu.edu/koopman02bit.html


> For 1500 byte packets the fraction of packets with three bits  
> flipped would be around 1 : 10^15, correcting for the larger number  
> of packets per given amount of data, that's a difference of about  
> 1 : 100.
>

Quoting from "When The CRC and TCP Checksum Disagree" by Jonathan  
Stone and Craig Partridge:

http://citeseer.ist.psu.edu/cache/papers/cs/21401/ 
http:zSzzSzsigcomm.it.uu.sezSzconfzSzpaperzSzsigcomm2000-9-1.pdf/ 
stone00when.pdf

"Traces of Internet packets from the past two years show that between  
1 packet in 1,100 and 1 packet in 32,000 fails the TCP checksum, even  
on links where link-level CRCs should catch all but 1 in 4 billion  
errors.  For certain situations, the rate of checksum failures can be  
even higher: in one hour-long test we observed a checksum failure of  
1 packet in 400.  We investigate why so many errors are observed,  
when link-level CRCs should catch nearly all of them.

We have collected nearly 500,000 packets which failed the TCP or UDP  
or IP checksum. This dataset shows the Internet has a wide variety of  
error sources which can not be detected by link-level checks.  We  
describe analysis tools that have identified nearly 100 different  
error patterns. Categorizing packet errors, we can infer likely  
causes which explain roughly half the observed errors. The causes  
span the entire spectrum of a network stack, from memory errors to  
bugs in TCP.

After an analysis we conclude that the checksum will fail to detect  
errors for roughly 1 in 16 million to 10 billion packets. From our  
analysis of the cause of errors, we propose simple changes to several  
protocols which will decrease the rate of undetected error. Even so,  
the highly non-random distribution of errors strongly suggests some  
applications should employ application-level checksums or equivalents."

Hardware weaknesses within DSLAMs or various memory arrays, such as a  
weak driver on some internal interface, can generate high levels of  
multi-bit errors not detected by TCP checksums.  When affecting the  
same bit within an interface, more than 1 out of 100 may go undetected.


> That seems like a lot, but getting better quality fiber easily  
> compensates for this. Expressed differently, the average amount of  
> data transmitted where you see one packet with three flipped bits  
> is around 10 petabytes for 11454 byte packets and some 1.3 exabytes  
> for 1500 byte packets. For the large packets that would be one  
> packet in three years at 1 Gbps, for the small ones one packet in  
> 380 years.

Consider that the CRC is not always carried with the packet between  
interfaces.

-Doug




More information about the NANOG mailing list