Traceroute losses through NYC1.gblx.net?

Steve Bohrer skbohrer at simons-rock.edu
Fri Sep 16 13:42:52 CDT 2011


My general question is "what meaning do I give to lossy traceroutes,  
even when pings show no problem."

Can I expect that backbone routers should never give me timeouts on a  
traceroute through them, so, lots of asterisks from these systems  
indicate a packet loss problem that needs to be fixed?

Or, are these traceroute asterisks essentially meaningless, and should  
be expected on any busy link?

More specifically, is anyone else getting lots of *s for NYC1.gblx.net  
for traceroutes through them? If I do three traceroutes through there,  
at least one will show losses at or beyond the NYC1 hops (and, the *s  
beyond NYC1 might be getting lost in NYC1, rather than indicating a  
different error).  But, Global Crossing's on-line tools don't show any  
loss.

I am at simons-rock.edu, in Western Mass, and we connect via Boston. A  
few days ago, our users of a database that's hosted at our parent  
campus, bard.edu, started complaining of many frequent (but  
intermittent) delays. Bard is in the Hudson Valley, and connects via  
Poughkeepsie. Both of our local providers connect to Global Crossing.  
Once before, we saw similar database symptoms, and that time, Bard had  
a problem dropping packets at their gateway. So I think these symptoms  
mean packet loss is happening somewhere. However, this time, pings  
from Simon's Rock to Bard, and vice-versa, show essentially no errors,  
typically 1000 pings will get through 100%.

Still, despite the good pings, traceroutes from either end show lots  
of asterisks at or after Global Crossing's NYC1.gblx.net links. I have  
opened a ticket with our provider, who has opened one with Global  
Crossing; and Bard has done the same with their end, but no  
significant response so far. (Bard's Graduate campus, located in New  
York City, is having similar poor database performance, so I'm pretty  
sure it is not just my end. Staff at the main Bard campus have no  
troubles, so it seems a network problem, not a server problem.)

As I understand it, an asterisk in traceroute means that the sending  
machine did not get any reply to a given packet. Since the traceroute  
packets have small TTL values, it expects to get a reply when the TTL  
is decremented to zero. But, I don't know if big routers are just lazy  
about sending such responses, or if these asterisks really indicate  
packets getting lost.  (As far as I remember in the past, when things  
work well, I never see *s at the central links, but, I have not really  
done any baseline testing of the link from here to Bard when the  
database was working.)

So, another question is why pings work so well when traceroutes work  
so poorly. (By experiment, I believe our database application performs  
more like traceroute than like ping.)  Is it packet size? Different  
handling for different sorts of traffic? Magic?

Here are some sample traceroutes each way:
Simon's Rock to Bard:

2h189:bin skbohrer$ traceroute -q5 -S bip.bard.edu
traceroute to bip.bard.edu (192.246.228.16), 64 hops max, 40 byte  
packets
  1  10.30.2.1 (10.30.2.1)  1.514 ms  1.791 ms  0.684 ms  0.761 ms   
0.712 ms (0% loss)
  2  michael.simons-rock.edu (208.81.88.1)  2.509 ms  1.882 ms  0.899  
ms  1.345 ms  2.057 ms (0% loss)
  3  64.213.79.249 (64.213.79.249)  104.294 ms  10.605 ms  17.106 ms   
18.987 ms  38.740 ms (0% loss)
  4  pos2-0-155M.cr2.BOS1.gblx.net (67.17.70.166)  21.962 ms  20.411  
ms  8.394 ms  23.308 ms  10.192 ms (0% loss)
  5  so1-2-0-2488M.scr2.NYC1.gblx.net (67.17.94.158)  15.738 ms   
14.582 ms  17.306 ms  24.444 ms  15.466 ms (0% loss)
  6  ae3-30g.scr3.NYC1.gblx.net (67.17.104.189)  15.586 ms  13.358 ms  
ae0-30G.scr4.NYC1.gblx.net (67.16.139.2)  13.875 ms  13.495 ms  12.780  
ms (0% loss)
  7  e5-1-30G.ar9.NYC1.gblx.net (67.16.142.54)  75.184 ms  
lag1.ar9.NYC1.gblx.net (67.16.142.50)  15.766 ms  11.947 ms *  
e5-1-30G.ar9.NYC1.gblx.net (67.16.142.54)  25.916 ms (20% loss)
  8  * * wbs-connect.gigabitethernet1-0-2.asr1.jfk1.gblx.net  
(64.211.195.6)  55.909 ms  73.803 ms * (60% loss)
  9  * pghknyshj42-xe-0-3-0.lightower.net (72.22.160.150)  16.521 ms   
21.817 ms  23.715 ms  17.236 ms (20% loss)
10  pghknyshj91-ae0-66.lightower.net (72.22.160.165)  76.257 ms   
27.712 ms  20.372 ms  18.923 ms  55.355 ms (0% loss)
11  kgtnnykgj91-ae3.66.lightower.net (72.22.160.107)  18.088 ms   
51.631 ms  19.052 ms  20.876 ms  22.942 ms (0% loss)
12  BardCollege-cust.customer.hvdata.net (64.72.66.234)  51.243 ms   
47.800 ms  32.835 ms  19.040 ms  55.661 ms (0% loss)
13  *^C


Bard to SR (their version of traceroute doen't have the handy -S  
option):

SRDB/users/usrsr/finrep: traceroute mail.simons-rock.edu
trying to get source for mail.simons-rock.edu
source should be 10.20.11.23
traceroute to hedwig.simons-rock.edu (208.81.88.14) from 10.20.11.23  
(10.20.11.23), 30 hops max
outgoing MTU = 1500
  1  hcrcgw (10.20.11.1)  1 ms  0 ms  0 ms
  2  hyphen (192.246.235.1)  1 ms  1 ms  1 ms
  3  BardCollege-hvdn.customer.hvdata.net (64.72.66.233)  1 ms  1 ms   
1 ms
  4  pghknyshj91-xe-5-2-0.lightower.net (72.22.160.106)  2 ms  2 ms  2  
ms
  5  pghknyshj42-ae0-66.lightower.net (72.22.160.159)  27 ms  2 ms  2 ms
  6  nycmnyzrj42-xe-0-3-0.lightower.net (72.22.160.151)  4 ms  4 ms  4  
ms
  7  ve463.ar9.NYC1.gblx.net (64.211.195.5)  4 ms  4 ms  4 ms
  8  * ae0-40G.scr1.NYC1.gblx.net (67.16.138.253)  4 ms  4 ms
  9  pos5-0-2488M.cr1.BOS1.gblx.net (67.17.94.57)  9 ms  
pos9-0-2488M.cr2.BOS1.gblx.net (67.17.94.157)  9 ms  11 ms
10  pos1-0-0-155M.ar1.BOS1.gblx.net (67.17.70.165)  14 ms  10 ms  9 ms
11  64.213.79.250 (64.213.79.250)  15 ms  15 ms  18 ms
^C


For more automated testing, I used -m10 to set the max hops so that  
the traces stop within the backbone network, as this avoids any issue  
of the boxes at the ends not really responding to traceroutes. That  
way, I could assume any * was a real time out. I also used -q4 for 4  
queries to each host. With a few hundred traceroutes each direction,  
more than 75% from SR to Bard, and more than 94% from Bard to SR,  
showed an asterisk at or past the NYC1 hops. There were zero asterisks  
on the links before NYC1 from either side.

Thanks for any insights.

Steve Bohrer
Network Administrator
ITS, Bard College at Simon's Rock
413-528-7645






More information about the NANOG mailing list