RINA - scott whaps at the nanog hornets nest :-)

Sat Nov 6 23:37:30 UTC 2010

On Sat, Nov 06, 2010 at 03:49:19PM -0700, George Bonser wrote:
> 
> When the TCP/IP connection is opened between the routers for a routing 
> session, they should each send the other an MSS value that says how 
> large a packet they can accept.  You already have that information 
> available. TCP provides that negotiation for directly connected 
> machines.

You're proposing that routers should dynamically alter the interface MTU 
based on the TCP MSS value they receive from an EBGP neighbor? I barely 
know where to begin, but first off MSS is not MTU, it is only loosely 
related to MTU. MSS is affected by TCP options (window scale, sack, MD5 
authentication, etc), and MSS between routers can be set to any value a 
user chooses. There is absolutely no guarantee that MSS is going to lead 
to a correct guess at the MTU. Also, many routers still default to 
having PMTUD turned off, would you suggest that they should set the 
physical interface MTU to 576 based on that? :) And alas, it's one hell 
of a layer violation too.

A negotiation protocol is needed, but you could argue about where it 
should be for days. Maybe at the physical layer as part of 
auto-negotiation, maybe at the L3<->L2 layer (i.e. negotiate it per IP 
as part of arp or neighbor discovery), hell maybe even in BGP, but keyed 
off MSS is way over the top. :)

> Again, nothing changes from the current method of operating. If I 
> showed up at a peering switch and wanted to use 1000 byte MTU, I would 
> probably have some problems.  The point I am making is that 1500 is a 
> relic value that hamstrings Internet performance and there is no good 
> reason not to use 9000 byte MTU at peering points (by all 
> participants) since it A: introduces no new problems and B: I can't 
> find a vendor of modern gear at a peering point that doesn't support 
> it though there may be some ancient gear at some peering points in use 
> by some of the peers.

Have you ever tried showing up to the Internet with a 1000 byte MTU? The 
only time that works correctly today is when you're rewriting TCP MSS 
values as the packet goes through the constrained link, which may be 
fine for the GRE tunnel to a Linux box at your house, but clearly can't 
work on the real Internet.

> I can not think of a problem changing from 1500 to 9000 as the 
> standard at peering points introduces.  It would also speed up the 

This suggests a serious lack of imagination on your part. :)

> loading of the BGP routes between routers at the peering points. If 

It's a very very modest increase at best.

> Joe Blow at home with a dialup connection with an MTU of 576 is 
> talking to a server at Y! with an MTU of 10 billion, changing a 
> peering path from 1500 to 9000 bytes somewhere in the path is not 
> going to change that PMTU discovery one iota.  It introduces no 
> problem whatsoever. It changes nothing.

You know one very good reason for the people on a dialup connection to 
have low MTUs is serialization delay. As link speeds have gotten faster 
but MTUs have stayed the same, one tangible benefit is the lack of a 
need for fair queueing to keep big packets from significantly increasing 
the latency of small packets.

Overall I agree with the theory of larger MTUs... Improved efficiency, 
being able to do page-flipping with your payload, not having to worry 
about screwing things up if you DO need to use a tunnel or turn on 
IPsec, it's all well and good... But from a practical standpoint there 
are still a lot of very serious issues that have not been addressed, and 
anyone who actually tries to do this at scale is in for a world of hurt. 

I for one would love to see the situation improved, but trying to gloss 
over it and pretend the problems don't exist just delays the day when it 
actually CAN be supported.

> That is a list of 9000 byte clean gear.  The very bottom is the stuff 
> that doesn't support it.  Of the stuff that doesn't support it, how 
> much is connected directly to a peering point?  THAT is the bottleneck

This argument is completely destroyed at the line that says 7206VXR 
w/PA-GE, you don't need to read any further.

> I am talking about right now.  One step at a time.  Removing the 
> bottleneck at the peering points is all I am talking about.  That will 
> not change PMTU issues elsewhere and those will stand just exactly as 
> they are today without any change.  In fact it will ensure that there 
> are *fewer* PMTU discovery issues by being able to support a larger 
> range of packets without having to fragment them.

The issues I listed are precisely why it doesn't work at peering points. 
I know this because I do a lot of peering, and I spend a lot of time 
dealing with getting people to peer at larger MTU values (correctly). If 
it was easier to do without breaking stuff, I'd be a lot more successful 
at it. :)

> We *already* have SONET MTU of >4000 and this hasn't broken anything 
> since the invention of SONET.

SONET MTU works because it's on by default, it's the same size 
everywhere, and every piece of gear supports it. It also doesn't 
accomplish anything, as almost no packets flowing through your SONET 
links are > 1500 bytes, and if you actually tried to show up to the 
Internet with a PC and a 4474 byte MTU you'd have a bad time. 

At any rate, I'm going to stop arguing this one, as I think we've beaten 
this dead horse enough for one day. Please read what I said carefully, I 
promise you this isn't as easy as you think it is. :)

-- 
Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)