Global BGP - 2001-06-23

lucifer at lightbearer.com lucifer at lightbearer.com
Mon Jun 25 16:09:14 UTC 2001


Brett Frankenberger wrote:
> 
> > 
> > A) Ciscos flap sessions, according to the only reports I've heard.
> 
> Is it an invalid AS_PATH?  If so, if such is received by a Cisco, the
> Cisco is required by the RFC to drop the session.  Failing to do so
> (and then propogating the bogus advertisement) was the cause of the
> original problem ... AFAIK, the fix (which was released a long time
> ago, but may not yet be running everywhere) causes the Cisco to behave
> properly, which is to drop the session.

Clarification: Ciscos take a buggy route, and turn it into an invalid
one. This causes Cisco peers to flap the session (yes, as they should),
and some other vendors (B, below) appear to have more serious issues.

> > B) <X> routers were crashing, either due to the bug, or the session resets.
> >    Thus, <X> is being flogged. I have reports of at least one <Y> having
> >    problems, as well.
> 
> Well, OK.  If <X> is crashing, then <X> has a problem.  And I didn't
> mean to imply that they didn't.  Mostly, I was posting because I
> frequently hear the "Bay vs. Cisco" crashes of yore reported as "Bay's
> were dropping BGP sessions".  That implies that the Bay was broke, when
> in reality Bay (and most other non-Cisco implementations) was doing
> what was required by the RFC.
> 
> The reason for my post, not knowing who <X> is (although I could
> probably guess) or what <X> was doing, was to clarify that routers that
> drop BGP sessions upon receiving invalid advertisements are not broken;
> but rather, they are doing what is required. 

A good point, and entirely true. I apologize for not being clear about
the bug, but I was/am trying to step carefully around the NDAs. And yes,
they're annoying, and there are probably some people who believe I'm
violating it even now. (Hopefully not the lawyers...)

> > I have no data on Bay; my apologies if this wasn't clear. Bay was *only*
> > being referenced as a historical point of note. No attempt at FUD, and my
> > apologies if anyone read it that way.
> 
> And I wasn't attempting to defend them, either -- I'm just curious
> about the problem.
> 
> Anyway, someone had to be passing this advertisement around ... if the
> Ciscos were dropping the session in response to it, and <X>'s were
> crashing, who's left to pass the bad advertisement around?  Cisco with
> older code that propogated the advertisement upon receipt, instead of
> issuing a NOTIFY and tearing the session down?

I'm not entirely clear on this; from the bug ID, it implies that iBGP
may be treated differently than external peers (specifically, part of
it appears to involve appending one's own ASN, possibly; again, I'm
not entirely clear on it, even reading the bug report).

> Naturally, you might be unable to answer the above, due to NDA ...
> mostly, I'm just fishing for details (from anywhere) on what happened. 

Sorry. As Sean said... most of it is covered by NDAs, and this is
exactly what will lead to required outage reporting for everyone, if
they don't start relaxing it some. From our point of view (here), a
lot of the issues were second-order, caused by the number of flaps in
the global table from various directions, and/or the bug in vendor
<X>'s equipment causing the reboots rapidly. Though, to their credit,
<X> was good about handling the ticket, and had engineers talking to
us rapidly, etc etc. Reasonable handling, IMO.

-- 
***************************************************************************
Joel Baker                           System Administrator - lightbearer.com
lucifer at lightbearer.com              http://www.lightbearer.com/~lucifer



More information about the NANOG mailing list