cross connect reliability

Thu Sep 17 23:56:24 UTC 2009

In message <20090917234547.GT51443 at gerbil.cluepon.net>, Richard A Steenbergen w
rites:
> On Thu, Sep 17, 2009 at 03:35:37PM -0700, Charles Wyble wrote:
> > 
> > Random failures of a single ports connectivity.... bizzare and annoying. 
> > Whole switches? Seen it.
> > Whole panels? Seen it.
> > Whole blades? Seen it.
> > 
> > Single port on a switch or patch panel? Never.
> 
> You've never seen a single port go bad on a switch? I can't even count
> the number of times I've seen that happen. Not that I'm not suggesting 
> the OP wasn't the victim of a human error like unplugging the wrong port 
> and they just lied to him, that happens even more.
> 
> My favorite bizarre random failure story is a toss-up between one of 
> these two:
> 
> Story 1. Had a customer report that they weren't able to transfer this
> one particular file over their connection. The transfer would start and
> then at a certain point the tcp session would just lock up. After a lot
> of head scratching, it turned out that for 8 ports on a 24 port FastE
> switch blade, this certain combination of bytes caused the packet to be
> dropped on this otherwise perfectly normal and functioning card, thus
> stalling the tcp session while leaving everything around it unaffected.
> If you moved them to a different port outside this group of 8, or used
> https, or uuencoded it, it would go through fine.

Seen that more than once.  It's worse when it's in some router on the
other side of the planet and your just a lowly customer.

> Story 2. Had a customer report that they were getting extremely slow 
> transfers to another network, despite not being able to find any packet 
> loss. Shifting the traffic to a different port to reach the same network 
> resolved the problem. After removing the traffic and attempting to ping 
> the far side, I got the following:
> 
> <drop>
> 64 bytes from x.x.x.x: icmp_seq=1 ttl=61 time=0.194 ms
> 64 bytes from x.x.x.x: icmp_seq=2 ttl=61 time=0.196 ms
> 64 bytes from x.x.x.x: icmp_seq=3 ttl=61 time=0.183 ms
> 64 bytes from x.x.x.x: icmp_seq=0 ttl=61 time=4.159 ms
> <drop>
> 64 bytes from x.x.x.x: icmp_seq=5 ttl=61 time=0.194 ms
> 64 bytes from x.x.x.x: icmp_seq=6 ttl=61 time=0.196 ms
> 64 bytes from x.x.x.x: icmp_seq=7 ttl=61 time=0.183 ms
> 64 bytes from x.x.x.x: icmp_seq=4 ttl=61 time=4.159 ms
> 
> After a little bit more testing, it turned out that every 4th packet
> that was being sent to the peers' router was being queued until another
> "4th packet" would come along and knock it out. If you increased the
> interval time of the ping, you would see the amount of time the packet
> spent in the queue increase. At one point I had it up to over 350
> seconds (not milliseconds) that the packet stayed in the other routers'
> queue before that 4th packet came along and knocked it free. I suspect
> it could have gone higher, but random scanning traffic on the internet
> was coming in. When there was a lot of traffic on the interface you
> would never see the packet loss, just reordering of every 4th packet and 
> thus slow tcp transfers. :)
> 
> -- 
> Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
> GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
> 
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: marka at isc.org