MTU to CDN's

Fri Jan 19 13:48:07 UTC 2018

Other than people improperly blocking ICMP, when does PMTUD not work? Honest question, not troll. 

----- 
Mike Hammett 
Intelligent Computing Solutions 
http://www.ics-il.com 

Midwest-IX 
http://www.midwest-ix.com 

----- Original Message -----

From: "Mikael Abrahamsson" <swmike at swm.pp.se> 
To: "Michael Crapse" <michael at wi-fiber.io> 
Cc: "NANOG list" <nanog at nanog.org> 
Sent: Friday, January 19, 2018 1:22:02 AM 
Subject: Re: MTU to CDN's 

On Thu, 18 Jan 2018, Michael Crapse wrote: 

> I don't mind letting the client premises routers break down 9000 byte 
> packets. My ISP controls end to end connectivity. 80% of people even let 
> our techs change settings on their computer, this would allow me to give 
> ~5% increase in speeds, and less network congestion for end users for a one 
> time $60 service many people would want. It's also where the internet 
> should be heading... Not to beat a dead horse(re:ipv6 ) but why hasn't the 
> entire internet just moved to 9000(or 9600 L2) byte MTU? It was created for 
> the jump to gigabit... That's 4 orders of magnitude ago. The internet 
> backbone shouldn't be shuffling around 1500byte packets at 1tbps. That 
> means if you want to layer 3 that data, you need a router capable of more 
> than half a billion packets/s forwarding capacity. On the other hand, with 
> even just a 9000 byte MTU, TCP/IP overhead is reduced 6 fold, and 
> forwarding capacity needs just 100 or so mpps capacity. Routers that 
> forward at that rate are found for less than $2k. 

As usual, there are 5-10 (or more) factors playing into this. Some, in 
random order: 

1. IEEE hasn't standardised > 1500 byte ethernet packets 
2. DSL/WIFI chips typically don't support > ~2300 because reasons. 
3. Because 2, most SoC ethernet chips don't either 
4. There is no standardised way to understand/probe the L2 MTU to your 
next hop (ARP/ND and probing if the value actually works) 
5. PMTUD doesn't always work. 
6. PLPMTUD hasn't been implemented neither in protocols nor hosts 
generally. 
7. Some implementations have been optimized to work on packets < 2000 
bytes and actually has less performance than if they have to support 
larger packets (they will allocate 2k buffer memory per packet), 9k is 
ill-fitting across 2^X values 
8. Because of all above reasons, mixed-MTU LAN doesn't work, and it's 
going to be mixed-MTU unless you control all devices (which is typically 
not the case outside of the datacenter). 
9. The PPS problem in hosts and routers was solved by hardware offloading 
to NICs and forwarding NPUs/ASICs with very high lookup speeds where PPS 
no longer was a big problem. 

On the value to choose for "large MTU", 9000 for edge and 9180 for core is 
what I advocate, after non-trivial amount of looking into this. All major 
core routing platforms work with 9180 (with JunOS only supporting this 
after 2015 or something). So if we'd want to standardise on MTU that all 
devices should support, then it's 9180, but we'd typically use 9000 in RA 
to send to devices. 

If we want a higher MTU to be deployable across the Internet, we need to 
make it incrementally deployable. Some key things to achieve that: 

1. Get something like 
https://tools.ietf.org/html/draft-van-beijnum-multi-mtu-05 implemented. 
2. Go to the IETF and get a document published that advises all protocols 
to support PLMTUD (RFC4821) 

1 to enable mixed-MTU lans. 
2 to enable large MTU hosts to actually be able to communicate when PMTUD 
doesn't work. 

With this in place (wait ~10 years), larger MTU is now incrementally 
deployable which means it'll be deployable on the Internet, and IEEE might 
actually accept to standardise > 1500 byte packets for ethernet. 

-- 
Mikael Abrahamsson email: swmike at swm.pp.se