RINA - scott whaps at the nanog hornets nest :-)
Richard A Steenbergen
ras at e-gerbil.net
Sat Nov 6 20:36:16 UTC 2010
On Sat, Nov 06, 2010 at 12:32:55PM -0700, George Bonser wrote:
> > I doubt that 1500 is (still) widely used in our Internet... Might be,
> > though, that most of us don't go all the way to 9k.
>
> Last week I asked the operator of fairly major public peering points
> if they supported anything larger than 1500 MTU. The answer was "no".
It would be absolutely trivial for them to enable jumbo frames, there is
just no demand for them to do so, as supporting Internet wide jumbo
frames (particularly over exchange points) is highly non-scalable in
practice.
It's perfectly safe to have the L2 networks in the middle support the
largest MTU values possible (other than maybe triggering an obscure
Force10 bug or something :P), so they could roll that out today and you
probably wouldn't notice. The real issue is with the L3 networks on
either end of the exchange, since if the L3 routers that are trying to
talk to each other don't agree about their MTU valus precisely, packets
are blackholed. There are no real standards for jumbo frames out there,
every vendor (and in many cases particular type/revision of hardware
made by that vendor) supports a slightly different size. There is also
no negotiation protocol of any kind, so the only way to make these two
numbers match precisely is to have the humans on both sides talk to each
other and come up with a commonly supported value.
There are two things that make this practically impossible to support at
scale, even ignoring all of the grief that comes from trying to find a
clueful human to talk to on the other end of your connection to a third
party (which is a huge problem in and of itself):
#1. There is currently no mechanism on any major router to set multiple
MTU values PER NEXTHOP on a multi-point exchange, so to do jumbo frames
over an exchange you would have to pick a single common value that
EVERYONE can support. This also means you can't mix and match jumbo and
non-jumbo participants over the same exchange, you essentially have to
set up an entirely new exchange point (or vlan within the same exchange)
dedicated to the jumbo frame support, and you still have to get a common
value that everyone can support. Ironically many routers (many kinds of
Cisco and Juniper routers at any rate) actually DO support per-nexthop
MTUs in hardware, there is just no mechanism exposed to the end user to
configure those values, let alone auto-negotiate them.
#2. The major vendors can't even agree on how they represent MTU sizes,
so entering the same # into routers from two different vendors can
easily result in incompatible MTUs. For example, on Juniper when you
type "mtu 9192", this is INCLUSIVE of the L2 header, but on Cisco the
opposite is true. So to make a Cisco talk to a Juniper that is
configured 9192, you would have to configure mtu 9178. Except it's not
even that simple, because now if you start adding vlan tagging the L2
header size is growing. If you now configure vlan tagging on the
interface, you've got to make the Cisco side 9174 to match the Juniper's
9192. And if you configure flexible-vlan-tagging so you can support
q-in-q, you've now got to configure to Cisco side for 9170.
As an operator who DOES fully support 9k+ jumbos on every internal link
in my network, and as many external links as I can find clueful people
to talk to on the other end to negotiate the correct values, let me just
tell you this is a GIANT PAIN IN THE ASS. And we're not even talking
about making sure things actually work right for the end user. Your IGP
may not come up at all if the MTUs are misconfigured, but EBGP certainly
will, even if the two sides are actually off by a few bytes. The maximum
size of a BGP message is 4096 octets, and there is no mechanism to pad a
message and try to detect MTU incompatibility, so what will actually
happen in real life is the end user will try to send a big jumbo frame
through and find that some of their packets are randomly and silently
blackholed. This would be an utter nightmare to support and diagnose.
Realistically I don't think you'll ever see even a serious attempt at
jumbo frame support implemented in any kind of scale until there is a
negotiation protocol and some real standards for the mtu size that must
be supported, which is something that no standards body (IEEE, IETF,
etc) has seemed inclined to deal with so far. Of course all of this is
based on the assumption that path mtu discovery will work correctly once
the MTU valus ARE correctly configured on the L3 routers, which is a
pretty huge assumption, given all the people who stupidly filter ICMP.
Oh and even if you solved all of those problems, I could trivially DoS
your router with some packets that would overload your ability to
generate ICMP Unreach Needfrag messages for PMTUD, and then all your
jumbo frame end users going through that router would be blackholed as
well.
Great idea in theory, epic disaster in practice, at least given the
mechanisms currently at our disposal. :)
--
Richard A Steenbergen <ras at e-gerbil.net> http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
More information about the NANOG
mailing list