IP Fragmentation

Wed Aug 20 17:19:31 UTC 2008

Glen Kent wrote:
> Do transit routers in the wild actually get to do IP fragmentation
> these days? I was wondering if routers actually do it or not, because
> the source usually discovers the path MTU and sends its data with the
> least supported MTU. Is this true?

I believe that is only true for TCP over IPv4. UDP over IPv4 per se 
doesn't involve any MTU path discovery. Some UDP applications may in 
fact attempt MTU discovery and self-limit teh size of their packets, but 
that's not part of the UDP protocol.

A hypothetical specific "real world" example of where very large UDP 
packets might occur is SNMP. An SNMP "get" or "set" operation generally 
has to fit inside a UDP packet. But UDP allows up to 64k bytes in the 
datagram. If an SNMP object value is a really long string (say 2000 
bytes long), then it will exceed the typical 1500 MTU most Ethernet 
interfaces expect. So I believe fragmentation will occur at the 
originating system. On the other hand, some systems support Ethernet 
jumbograms, so I believe it is possible that a default gateway router 
would be the first network element forced to fragment the datagram.

IPv6 is a different (and more complex) story of course - fragmentation 
is only supposed to occur on end points - even for UDP.

Quick experiment you can try if you have a Unix-like system handy: use 
ping (and/or ping6 or an IPv6 aware ping) and supply it with a "-s" data 
size parameter of, say, 2000. That makes a larger than normal packet 
that can't fit into a standard Ethernet frame. Use wireshark or ethereal 
to see what happens. If your Ethernet cards support jumbograms, use the 
mtu parameter of ifconfig and set it up larger than 1500. Repeat the 
experiment with the large data sized pings with both locally and remote 
systems.

> Even if this is, then this would break for multicast IP. The source
> cannot determine which receivers would get interested in the traffic
> and what capacities the links connecting them would support. So, a
> source would send IP packets with some size, and theres a chance that
> one of the routers *may* have to fragment those IP packets before
> passing it on to the next router.
> 
> I would wager that the vendors and operators would want to avoid IP
> fragmentation since thats usually done in SW (unless you've got a very
> powerful ASIC or your box is NP based).

I'm not sure how to address the above points since there appear to be 
some incorrect assumptions at play. It all depends on whether the Don't 
Fragment (DF) bit is set in IPv4 and how the source application responds 
to any resulting ICMP error responses (if the DF is set and one of the 
routes requires fragmentation).