Thoughts on increasing MTUs on the internet

Thu Apr 12 23:41:30 UTC 2007

Iljitsch van Beijnum wrote:
> 
> Dear NANOGers,
> 
> It irks me that today, the effective MTU of the internet is 1500 bytes, 
> while more and more equipment can handle bigger packets.
> 
> What do you guys think about a mechanism that allows hosts and routers 
> on a subnet to automatically discover the MTU they can use towards other 
> systems on the same subnet, so that:
> 
> 1. It's no longer necessary to limit the subnet MTU to that of the least 
> capable system
> 
> 2. It's no longer necessary to manage 1500 byte+ MTUs manually
> 
> Any additional issues that such a mechanism would have to address?

I have a half completed, prototype "mtud" that runs under Linux.  It 
sets the interface to 9k, but sets the route for the subnet down to 
1500.  It then watches the arp table for new arp entries.  As a new MAC 
is added, it sends a 9k UDP datagram to that host and listens for an 
ICMP port unreachable reply (like traceroute does).  If the error 
arrives, it assumes that host can receive packets that large, and adds a 
host route with the larger MTU to that host.  It steps up the mtu's from 
1500 to 16k trying to rapidly increase the MTU without having to wait 
for annoying timeouts.  If anything goes wrong somewhere along the way, 
(a host is firewalled or whatever) then it won't receive the ICMP reply, 
and won't raise the MTU.

The idea is that you can run this on routers/servers on a network that 
has 9k mtu's but not all the hosts are assured to be 9k capable, and it 
will increase correctly detect the available MTU between servers, or 
routers, but still be able to correctly talk to machines that are still 
stuck with 1500 byte mtu's etc.

In other interesting data points in this field, for some reason a while 
ago we had reason to do some throughput tests under Linux with varying 
the MTU using e1000's and ended up with this pretty graph:

http://wand.net.nz/~perry/mtu.png

we never had the time to investigate exactly what was going on, but 
interestingly at 8k MTU's (which is presumably what NFS would use), 
performance is exceptionally poor compared to 9k and 1500 byte MTU's. 
Our (untested) hypothesis is that the Linux kernel driver isn't smart 
about how it allocates it's buffers.