high latency ds3 issue on unloaded line

mike mike-nanog at tiedyenetworks.com
Fri Sep 26 11:04:27 CDT 2008


Hello,

    I have a ds3 from qwest which has daily issues with insane 
point-to-point latencies sometimes exceeding 1000ms for hours on end, 
and which suddenly disappear, and does not appear to correspond with 
actual measured link utilization (less than 20mbps most days).

    To make a long investigation short, the problem comes on during the 
day and then lets up late in the evening. I have tested and examined 
everything at the ip layer and no it's not high utilization, an ACL, 
router cpu or bad hardware, no line errors or other issues visible from 
interface or controller stats. yes I have flushed all hardware, and I 
have a 7204vxr/npe-400 with this single ds3. The only clue seems to be 
millions of 'output drops' from qwest's side. And at night I can hit 
popular ftp mirrors from a directly attached server and observe my 
interface reporting about %100 utilization combined with my users and 
customers, so yeah it really is a full line rate ds3. And historically 
Mrtg always shows around 20mbps or less utilization and it's only 
smokeping that goes off, usually in the afternoon when the point to 
point latencies between my router and qwest start heading north, and 
consistently at that. I also have another in house tool that takes 30 
second snapshots of my ds3 interface in order to catch short bursts that 
would be smoothed out with mrtg's 5 minute average, but during these 
high latency times there aren't any spikes noted. And for added 
confusion (or fun!), the latency can start at any utilization level - 
I've observed it while we were pulling just 12mbps, and I have not had 
it while we were doing 34mbps, only the time of day seems to be the 
common factor.

    Qwest has not been able to identify the issue, only note that - 
yeah, this really is happening when there is otherwise no real load on 
the line - and I am certain we have done everything to rule out the ip 
layer. They have put in a 'request' to move me to another router, but I 
am not hopeful of a resolution that way as the router we're currently on 
doesn't appear otherwise to have the problem with any other subscriber.

    What I want to know, is it possible that the underlaying atm/sonet 
that carries my ds3 from my facility is somehow oversubscribed or 
misconfigured? We have an OC12 fiber entrance and this is the only 
circuit provisioned on it, and in our small tiny town the only other 
user on the ring with us is comcast (according to the att network 
engineer who installed this). I don't know enough about atm/sonet to 
imagine conditions that would cause the issues I am seeing here , but 
every ip layer tool I have only ever tells me there isn't an ip issue 
here. I can issue ping from my router directly to the attached qwest 
router and get > 1000ms and then other times (out of the problem 
window), I am getting 4ms.

    If anyone has laughs or beers to offer me, send 'em on cuz I could 
use both right about now....

Mike-





More information about the NANOG mailing list