MTU to CDN's
ruairi.carroll at gmail.com
Fri Jan 19 14:13:04 UTC 2018
On 19 January 2018 at 13:48, Mike Hammett <nanog at ics-il.net> wrote:
> Other than people improperly blocking ICMP, when does PMTUD not work?
> Honest question, not troll.
It can break under _certain_ scenarios with Anycast.
It can break under _certain_ scenarios in v6 with ECMP.
It can break across an LB in L4 mode, when a real behind the LB has an
None of these scenarios are the normal, obviously, however PMTUD does have
> Mike Hammett
> Intelligent Computing Solutions
> ----- Original Message -----
> From: "Mikael Abrahamsson" <swmike at swm.pp.se>
> To: "Michael Crapse" <michael at wi-fiber.io>
> Cc: "NANOG list" <nanog at nanog.org>
> Sent: Friday, January 19, 2018 1:22:02 AM
> Subject: Re: MTU to CDN's
> On Thu, 18 Jan 2018, Michael Crapse wrote:
> > I don't mind letting the client premises routers break down 9000 byte
> > packets. My ISP controls end to end connectivity. 80% of people even let
> > our techs change settings on their computer, this would allow me to give
> > ~5% increase in speeds, and less network congestion for end users for a
> > time $60 service many people would want. It's also where the internet
> > should be heading... Not to beat a dead horse(re:ipv6 ) but why hasn't
> > entire internet just moved to 9000(or 9600 L2) byte MTU? It was created
> > the jump to gigabit... That's 4 orders of magnitude ago. The internet
> > backbone shouldn't be shuffling around 1500byte packets at 1tbps. That
> > means if you want to layer 3 that data, you need a router capable of more
> > than half a billion packets/s forwarding capacity. On the other hand,
> > even just a 9000 byte MTU, TCP/IP overhead is reduced 6 fold, and
> > forwarding capacity needs just 100 or so mpps capacity. Routers that
> > forward at that rate are found for less than $2k.
> As usual, there are 5-10 (or more) factors playing into this. Some, in
> random order:
> 1. IEEE hasn't standardised > 1500 byte ethernet packets
> 2. DSL/WIFI chips typically don't support > ~2300 because reasons.
> 3. Because 2, most SoC ethernet chips don't either
> 4. There is no standardised way to understand/probe the L2 MTU to your
> next hop (ARP/ND and probing if the value actually works)
> 5. PMTUD doesn't always work.
> 6. PLPMTUD hasn't been implemented neither in protocols nor hosts
> 7. Some implementations have been optimized to work on packets < 2000
> bytes and actually has less performance than if they have to support
> larger packets (they will allocate 2k buffer memory per packet), 9k is
> ill-fitting across 2^X values
> 8. Because of all above reasons, mixed-MTU LAN doesn't work, and it's
> going to be mixed-MTU unless you control all devices (which is typically
> not the case outside of the datacenter).
> 9. The PPS problem in hosts and routers was solved by hardware offloading
> to NICs and forwarding NPUs/ASICs with very high lookup speeds where PPS
> no longer was a big problem.
> On the value to choose for "large MTU", 9000 for edge and 9180 for core is
> what I advocate, after non-trivial amount of looking into this. All major
> core routing platforms work with 9180 (with JunOS only supporting this
> after 2015 or something). So if we'd want to standardise on MTU that all
> devices should support, then it's 9180, but we'd typically use 9000 in RA
> to send to devices.
> If we want a higher MTU to be deployable across the Internet, we need to
> make it incrementally deployable. Some key things to achieve that:
> 1. Get something like
> https://tools.ietf.org/html/draft-van-beijnum-multi-mtu-05 implemented.
> 2. Go to the IETF and get a document published that advises all protocols
> to support PLMTUD (RFC4821)
> 1 to enable mixed-MTU lans.
> 2 to enable large MTU hosts to actually be able to communicate when PMTUD
> doesn't work.
> With this in place (wait ~10 years), larger MTU is now incrementally
> deployable which means it'll be deployable on the Internet, and IEEE might
> actually accept to standardise > 1500 byte packets for ethernet.
> Mikael Abrahamsson email: swmike at swm.pp.se
More information about the NANOG