Single IP routing problems through Level3

Sun Jun 15 15:56:24 UTC 2008

I've seen the exact same symptoms before with another provider and it was a L3 Port-Channel that was not balanced properly due to a link being down which wasn't detected as such.  It was also causing very sporadic latency spikes and dropped packets.

Jon
----- Original Message -----
From: "Matt Palmer" <mpalmer at hezmatt.org>
To: nanog at nanog.org
Sent: Sunday, June 15, 2008 10:39:56 AM GMT -05:00 US/Canada Eastern
Subject: Re: Single IP routing problems through Level3

On Sun, Jun 15, 2008 at 11:12:25AM -0300, Rubens Kuhl Jr. wrote:
> 1) I've seen this behavior before; you are not alone in the universe.

Thank $DEITY for that.  <grin>

> 2) Most likely there is a balanced channel on the path, either L3 or
> L2, and one of the links in the bundle is dead but has not been
> detected as such.

A multiple-link bundle which is load balanced by source/destination pair
with an undetected dud link?  I hadn't thought of that, but it does make an
*awful* lot of sense.  (Although, not being a big-network transit kinda
person, I don't know if such a thing actually exists <grin>) I'll mention it
(or ask about it) as a possibility next time I talk to the relevant people,
though.

Thanks,
- Matt

> On Sun, Jun 15, 2008 at 11:01 AM, Matt Palmer <mpalmer at hezmatt.org> wrote:
> > We're seeing some really weird issues with connections that go through / to
> > Level3 IP space.  Basically, certain "pairs" of IPs (particular L3 IPs
> > coupled with particular IPs of ours) have dodgy/nonexistent connectivity,
> > but if you change the IP at either end everything's hunky dory.
> >
> > I've sniffed (from both ends) pings going from a host in L3 space to our end
> > and seen the pings arrive at our end and head back in the direction of L3,
> > but they never get to their destination.  Traceroutes from L3 stop at the
> > next-to-last hop, while traceroutes back get to the hop before L3 space and
> > stop.
> >
> > All of this behaviour is source/dest *pair* specific -- if I ping/traceroute
> > from another address (in the same netblock as the problematic IP, so all the
> > same equipment is involved) at either end, or to another address (again,
> > same netblock) at either end, it all works again.
> >
> > I've got two questions:
> >
> > 1) Has anyone else seen similar behaviour from L3 (or other providers,
> >   even), so I know I'm not going mad?
> >
> > 2) What sort of configuration problem or software bug would cause this sort
> >   of problem to occur?  If it was an IP blacklist (or even a block routing
> >   issue) anywhere along the line, surely it wouldn't be sensitive to
> >   changing the other end's address to another one in the same /24?
> >
> > Any insight/anecdotes/etc would be greatly appreciated, as it's starting to
> > do my head in.  Just knowing I'm not alone with this insanity would be nice
> > at this point.  <grin>
> >
> > If it makes any difference, the blocks I'm working from at my end are
> > Internap, in 74.201.254.0/23 (we don't have all of it, just most of it),
> > while the far end is 8.12.35.0/24.