Chronic Abnormal Traceroutes Traversing Level 3

Fri Apr 11 16:13:26 UTC 2014

If there's anybody from Level 3 Transport available, I'd like to discuss some bizarre results when traversing through your network, namely in Dallas, TX over the past few months?  I'm working this through your NOC as well, but figured I would cover all avenues as this issue is pretty chronic.

Taken from your Looking Glass in Dallas to a destination of 195.110.36.136

1 vlan70.csw2.Dallas1.Level3.net (4.69.145.126) 120 msec

    vlan60.csw1.Dallas1.Level3.net (4.69.145.62) 120 msec

    vlan80.csw3.Dallas1.Level3.net (4.69.145.190) 120 msec

  2 ae-93-93.ebr3.Dallas1.Level3.net (4.69.151.170) 120 msec

    ae-73-73.ebr3.Dallas1.Level3.net (4.69.151.146) 120 msec

    ae-63-63.ebr3.Dallas1.Level3.net (4.69.151.134) 120 msec

  3 ae-7-7.ebr4.Atlanta2.Level3.net (4.69.134.22) 36 msec 20 msec 20 msec

  4 ae-2-2.ebr1.Washington1.Level3.net (4.69.132.86) 120 msec 116 msec 120 msec

  5 ae-71-71.csw2.Washington1.Level3.net (4.69.134.134) 120 msec

    ae-81-81.csw3.Washington1.Level3.net (4.69.134.138) 120 msec

    ae-71-71.csw2.Washington1.Level3.net (4.69.134.134) 120 msec

  6 ae-92-92.ebr2.Washington1.Level3.net (4.69.134.157) 120 msec 120 msec

    ae-72-72.ebr2.Washington1.Level3.net (4.69.134.149) 124 msec

  7 ae-44-44.ebr2.Paris1.Level3.net (4.69.137.61) 120 msec

    ae-42-42.ebr2.Paris1.Level3.net (4.69.137.53) 116 msec 116 msec

  8 ae-62-62.csw1.Paris1.Level3.net (4.69.161.94) 124 msec

    ae-72-72.csw2.Paris1.Level3.net (4.69.161.98) 120 msec 120 msec

  9 ae-81-81.ebr1.Paris1.Level3.net (4.69.161.85) 120 msec

    ae-91-91.ebr1.Paris1.Level3.net (4.69.161.89) 124 msec

    ae-71-71.ebr1.Paris1.Level3.net (4.69.161.81) 120 msec

 10 ae-41-41.ebr2.London2.Level3.net (4.69.159.81) 120 msec 120 msec

    ae-43-43.ebr2.London2.Level3.net (4.69.159.89) 120 msec

 11 ae-12-3202.edge4.London2.Level3.net (4.69.202.182) 116 msec 120 msec 120 msec

 12 ae-26-26.car2.London2.Level3.net (4.69.200.98) 120 msec 120 msec 120 msec

 13 BACKBONE-CO.car2.London2.Level3.net (195.50.121.50) 120 msec 120 msec 120 msec

 14  *  *  *

 15  *  *  *

 16  *  *  *

I'm confused how the first and last hops are showing equal latency, while being half way around the world.  I'm also confused why it's 120ms to the first hop from a route server local to that market.  While I wouldn't rely on a traceroute as a true measurement of end to end latency, I'm having problems explaining to customers experiencing tangible issues when their traceroute looks like this:

traceroute source 24.155.144.226 195.110.36.136
traceroute to 195.110.36.136 (195.110.36.136) from 24.155.144.226, 30 hops max, 40 byte packets
 1  lag-8-868.ear1.Dallas1.Level3.net (4.30.74.53)  924.705 ms  545.117 ms  512.992 ms
 2  4.69.146.5 (4.69.146.5)  124.812 ms 4.69.146.21 (4.69.146.21)  125.686 ms 4.69.146.9 (4.69.146.9)  124.018 ms
     MPLS Label=1965 CoS=0 TTL=1 S=1
 3  ae-73-73.ebr3.Dallas1.Level3.net (4.69.151.146)  125.012 ms ae-83-83.ebr3.Dallas1.Level3.net (4.69.151.158)  141.585 ms ae-63-63.ebr3.Dallas1.Level3.net (4.69.151.134)  125.005 ms
     MPLS Label=1810 CoS=0 TTL=1 S=1
 4  * * *
 5  ae-2-2.ebr1.Washington1.Level3.net (4.69.132.86)  126.085 ms  125.994 ms  127.148 ms
     MPLS Label=1692 CoS=0 TTL=1 S=1
 6  ae-61-61.csw1.Washington1.Level3.net (4.69.134.130)  124.500 ms ae-81-81.csw3.Washington1.Level3.net (4.69.134.138)  125.369 ms ae-61-61.csw1.Washington1.Level3.net (4.69.134.130)  124.306 ms
     MPLS Label=1555 CoS=0 TTL=1 S=1
 7  ae-92-92.ebr2.Washington1.Level3.net (4.69.134.157)  131.126 ms  128.607 ms ae-72-72.ebr2.Washington1.Level3.net (4.69.134.149)  123.955 ms
     MPLS Label=1636 CoS=0 TTL=1 S=1
 8  ae-42-42.ebr2.Paris1.Level3.net (4.69.137.53)  127.156 ms ae-41-41.ebr2.Paris1.Level3.net (4.69.137.49)  125.736 ms ae-43-43.ebr2.Paris1.Level3.net (4.69.137.57)  143.070 ms
     MPLS Label=1801 CoS=0 TTL=1 S=1
 9  ae-82-82.csw3.Paris1.Level3.net (4.69.161.102)  122.998 ms  122.245 ms ae-72-72.csw2.Paris1.Level3.net (4.69.161.98)  124.499 ms
     MPLS Label=1564 CoS=0 TTL=1 S=1
10  ae-61-61.ebr1.Paris1.Level3.net (4.69.161.77)  129.705 ms ae-81-81.ebr1.Paris1.Level3.net (4.69.161.85)  122.705 ms ae-91-91.ebr1.Paris1.Level3.net (4.69.161.89)  126.454 ms
     MPLS Label=1322 CoS=0 TTL=1 S=1
11  ae-42-42.ebr2.London2.Level3.net (4.69.159.85)  126.019 ms  125.672 ms ae-44-44.ebr2.London2.Level3.net (4.69.159.93)  127.026 ms
     MPLS Label=1911 CoS=0 TTL=1 S=1
12  ae-12-3202.edge4.London2.Level3.net (4.69.202.182)  122.058 ms  124.301 ms  123.397 ms
     MPLS Label=300112 CoS=0 TTL=1 S=1
13  ae-26-26.car2.London2.Level3.net (4.69.200.98)  124.003 ms  124.567 ms  130.346 ms
14  BACKBONE-CO.car2.London2.Level3.net (195.50.121.50)  125.316 ms  125.991 ms  124.968 ms
15  * * *

This also doesn't seem to be circuit specific (Eliminating possible physical circuit errors, etc) as I have a secondary 10-Gig sourcing out of San Antonio, TX (Which ultimately terminates on your network in Dallas, same as the Austin Circuit) which demonstrates similar results (120+ ms at the second hop in Dallas when entering your MPLS cloud).  Killing the BGP adjacency to both the Austin and San Antonio peers clears the issue up, but obviously that's not a long term viable option.  Any help would be appreciated.

JJ Stonebraker
IP Network Engineering
Grande Communications
512.878.5627