impossible circuit

George Carey george at montco.net
Mon Aug 11 05:24:39 UTC 2008


A strange one indeed, especially if you have no connectivity to Sprint 
there.

Since your fix was layer 2 you might be onto something. And you have the 
time it happened, and as we all know - somebody changed somethin' even 
if they won't fess up.

I'm trying to think how you could cause something like that with a 
conventional DACS or one of the newer packet friendly types that might 
be more prone to a layer 2 bug since software is fairly new. Course it 
would make more sense if it was just crossed Ethernet rather than DS3 
frames but who knows. There are plenty of carriers putting them in 
(including us).

Shame it's not the kind of thing you can duplicate without being service 
affecting.

George


Jon Lewis wrote:
> After all the messages recently about how to fix DNS, I was seriously 
> tempted to title this messsage "And now, for something completely 
> different", but impossible circuit is more descriptive.
> 
> Before you read further, I need everyone to put on their thinking WAY 
> outside the box hats.  I've heard from enough people already that I'm 
> nuts and what I'm seeing can't happen, so it must not be 
> happening...even though we see the results of it happening.
> 
> I've got this private line DS3.  It connects cisco 7206 routers in 
> Orlando (at our data center) and in Ocala (a colo rack in the Embarq CO).
> 
> According to the DLR, it's a real circuit, various portions of it ride 
> varying sized OC circuits, and then it's handed off to us at each end 
> the usual way (copper/coax) and plugged into PA-2T3 cards.
> 
> Last Tuesday, at about 2:30PM, "something bad happened."  We saw a 
> serious jump in traffic to Ocala, and in particular we noticed one 
> customer's connection (a group of load sharing T1s) was just totally 
> full.  We quickly assumed it was a DDoS aimed at that customer, but 
> looking at the traffic, we couldn't pinpoint anything that wasn't 
> expected flows.
> 
> Then we noticed the really weird stuff.  Pings to anything in Ocala 
> responded with multiple dupes and ttl exceeded messages from a Level3 
> IP. Traceroutes to certain IPs in Ocala would get as far our Ocala 
> router, then inexplicably hop onto Sprintlink's network, come back to us 
> over our Level3 transit connection, get to Ocala, then hop over to 
> Sprintlink again, repeating that loop as many times as max TTL would 
> permit.  Pings from router to router crossing just the DS3 would work, 
> but we'd see 10 duplicate packets for every 1 expected packet.  BTW, the 
> cisco CLI hides dupes unless you turn on ip icmp debugging.
> 
> I've seen some sort of similar things (though contained within an AS) 
> with MPLS and routing misconfigurations, but traffic jumping off our 
> network (to a network to which we're not directly connected) was 
> seemingly impossible.  We did all sorts of things to troubleshoot it 
> (studied our router configs in rancid, temporarily shut every interface 
> on the Ocala side other than the DS3, changed IOS versions, changed out 
> the hardware, opened a ticket with cisco TAC) but then it occurred to 
> me, that if traffic was actually jumping off our network and coming back 
> in via Level3, I could see/block at least some of that using an ACL on 
> our interface to Level3.  How do you explain it, when you ping the 
> remote end of a DS3 interface with a single echo request packet and see 
> 5 copies of that echo request arrive at one of your transit provider 
> interfaces?
> 
> Here's a typical traceroute with the first few hops (from my home 
> internet connection) removed.  BTW, hop 9 is a customer router 
> conveniently configured with no ip unreachables.
> 
>  7  andc-br-3-f2-0.atlantic.net (209.208.9.138)  47.951 ms  56.096 ms  
> 56.154 ms
>  8  ocalflxa-br-1-s1-0.atlantic.net (209.208.112.98)  56.199 ms  56.320 
> ms  56.196 ms
>  9  * * *
> 10  sl-bb20-dc-6-0-0.sprintlink.net (144.232.8.174)  80.774 ms  81.030 
> ms  81.821 ms
> 11  sl-st20-ash-10-0.sprintlink.net (144.232.20.152)  75.731 ms  75.902 
> ms  77.128 ms
> 12  te-10-1-0.edge2.Washington4.level3.net (4.68.63.209)  46.548 ms  
> 53.200 ms  45.736 ms
> 13  vlan69.csw1.Washington1.Level3.net (4.68.17.62)  42.918 ms 
> vlan79.csw2.Washington1.Level3.net (4.68.17.126)  55.438 ms 
> vlan69.csw1.Washington1.Level3.net (4.68.17.62)  42.693 ms
> 14  ae-81-81.ebr1.Washington1.Level3.net (4.69.134.137)  48.935 ms 
> ae-61-61.ebr1.Washington1.Level3.net (4.69.134.129)  49.317 ms 
> ae-91-91.ebr1.Washington1.Level3.net (4.69.134.141)  48.865 ms
> 15  ae-2.ebr3.Atlanta2.Level3.net (4.69.132.85)  59.642 ms  56.278 ms  
> 56.671 ms
> 16  ae-61-60.ebr1.Atlanta2.Level3.net (4.69.138.2)  47.401 ms  62.980 
> ms  62.640 ms
> 17  ae-1-8.bar1.Orlando1.Level3.net (4.69.137.149)  40.300 ms  40.101 
> ms  42.690 ms
> 18  ae-6-6.car1.Orlando1.Level3.net (4.69.133.77)  40.959 ms  40.963 ms  
> 41.016 ms
> 19  unknown.Level3.net (63.209.98.66)  246.744 ms  240.826 ms  239.758 ms
> 20  andc-br-3-f2-0.atlantic.net (209.208.9.138)  39.725 ms  37.751 ms  
> 42.262 ms
> 21  ocalflxa-br-1-s1-0.atlantic.net (209.208.112.98)  43.524 ms  45.844 
> ms  43.392 ms
> 22  * * *
> 23  sl-bb20-dc-6-0-0.sprintlink.net (144.232.8.174)  63.752 ms  61.648 
> ms  60.839 ms
> 24  sl-st20-ash-10-0.sprintlink.net (144.232.20.152)  66.923 ms  65.258 
> ms  70.609 ms
> 25  te-10-1-0.edge2.Washington4.level3.net (4.68.63.209)  67.106 ms  
> 93.415 ms  73.932 ms
> 26  vlan99.csw4.Washington1.Level3.net (4.68.17.254)  88.919 ms  75.306 
> ms vlan79.csw2.Washington1.Level3.net (4.68.17.126)  75.048 ms
> 27  ae-61-61.ebr1.Washington1.Level3.net (4.69.134.129)  69.508 ms  
> 68.401 ms ae-71-71.ebr1.Washington1.Level3.net (4.69.134.133)  79.128 ms
> 28  ae-2.ebr3.Atlanta2.Level3.net (4.69.132.85)  64.048 ms  67.764 ms  
> 67.704 ms
> 29  ae-71-70.ebr1.Atlanta2.Level3.net (4.69.138.18)  68.372 ms  67.025 
> ms  68.162 ms
> 30  ae-1-8.bar1.Orlando1.Level3.net (4.69.137.149)  65.112 ms  65.584 
> ms  65.525 ms
> 
> Our circuit provider's support people have basically just maintained 
> that this behavior isn't possible and so there's nothing they can do 
> about it. i.e. that the problem has to be something other than the circuit.
> 
> I got tired of talking to their brick wall, so I contacted Sprint and 
> was able to confirm with them that the traffic in question really was 
> inexplicably appearing on their network...and not terribly close 
> geographically to the Orlando/Ocala areas.
> 
> So, I have a circuit that's bleeding duplicate packets onto an unrelated 
> IP network, a circuit provider who's got their head in the sand and 
> keeps telling me "this can't happen, we can't help you", and customers 
> who were getting tired of receiving all their packets in triplicate (or 
> more) saturating their connections and confusing their applications.  
> After a while, I had to give up on finding the problem and focus on just 
> making it stop.  After trying a couple of things, the solution I found 
> was to change the encapsulation we use at each end of the DS3.  I 
> haven't gotten confirmation of this from Sprint, but I assume they're 
> now seeing massive input errors one the one or more circuits where our 
> packets were/are appearing.  The important thing (for me) is that this 
> makes the packets invalid to Sprint's routers and so it keeps them from 
> forwarding the packets to us.  Cisco TAC finally got back to us the day 
> after I "fixed" the circuit...but since it was obviously not a problem 
> with our cisco gear, I haven't pursued it with them.
> 
> The only things I can think of that might be the cause are 
> misconfiguration in a DACS/mux somewhere along the circuit path or 
> perhaps a mishandled lawful intercept.  I don't have enough experience 
> with either or enough access to the systems that provide the circuit to 
> do any more than speculate.  Has anyone else ever seen anything like this?
> 
> If someone from Level3 transport can wrap their head around this, I'd 
> love to know what's really going on...but at least it's no longer an 
> urgent problem for me.
> 
> ----------------------------------------------------------------------
>  Jon Lewis                   |  I route
>  Senior Network Engineer     |  therefore you are
>  Atlantic Net                |
> _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
> 




More information about the NANOG mailing list