Problems with AT&T

Eric Osborne eosborne at cisco.com
Thu Mar 20 21:29:29 UTC 2003


On Thu, Mar 20, 2003 at 03:26:35PM -0500, bdragon at gweep.net wrote:
> 
> > If someone can identify what you are actually seeing, I'll check into
> > it.
> > If you are experiencing drops or slow traces, only through the core,
> > there is an issue with excessive de-prioritization of ICMP control
> > message with a particular router type (vendcor) in the core. End to end
> > data flow has not seemed to be affected but trace and ping core
> > latencies are looking very wierd. I've been asking customers to use
> > trace only for path detail and to use end to end ping for any
> > performance data.=20
> > 
> > Yes, the core is MPLS enabled. Diffserv acted on only at the edges
> > though.=20
> > 
> > Michelle
> 
> It could certainly be customers who have broken themselves. I've heard
> lots of stories about people who do PMTUD but simultaneously filter
> ICMP Can't Frag messages.
> 
> As soon as the Path MTU drops below whatever their local box is (usually
> 1500) they "break" although due to their own screwed up config.
> 
> Since MPLS adds additional overhead, dropping the MTU, I'ld seriously
> consider this as a possible reason.

Speaking very generally and not about any one specific network, this
is likely to not be the issue.  MPLS leads to problems on Ethernet,
but I've seen no problems in anything other than Eth/FE.  GigE and POS
haven't had the same issue; for one, default POS MTU is ~4k, which is
more than enough to hold packets from hosts that assume 576 or 1500,
and PMTU over an MPLS network takes the MPLS label stack size into
account when doing discovery.  

Also, some implementations have framers that can accept a packet
that's actually MTU+(N*4), where N is typically no more than 4, and
more likely 2.

And I think I can say without breaking any confidentially agreements
that AT&T's backbone Probably Isn't (nudge nudge wink wink) made up of
scads and scads of 10/100Mb links everywhere. :)

The biggest problem you can have with MPLS is if you have customers
who are connected at 4k or 9k or what have you, and who don't do
PMTUD; I've not seen this come up as a real operational issue.  


.02

eric

> 
> The major problems are:
> 1) identifying broken customers
> 2) convincing customers that they are broken when they "haven't changed
> anything"
> 3) getting them to actually change
> 
> Some folks just put off the problem until later by moving to MTUs > 1500.
> The only benefit to this is that hopefully when the customer next breaks
> it is as a direct result of them having "changed something" which gets
> you over the hurdle of convincing some person that their filtering of all
> ICMP isn't just stupid, but is also broken.



More information about the NANOG mailing list