Persistent BGP peer flapping - do you care?

Thu Jan 17 20:46:05 UTC 2002

On Thu, Jan 17, 2002 at 03:10:00PM -0500, Christopher A. Woodfield wrote:
> 
> This has been bandied about before, but one should note that the "drop the 
> peer if an error is received" is only really effective if the session that 
> initiated the error does not propogate it. Most Cisco routers running common IOS 
> images not only do not drop the session, but pass along the bad prefix, which 
> leads to the occasional bad route dropping peering sessions on most of 
> the Enterasys(*) routers on the planet.

Actually, my understanding (I haven't had this happen to me directly,
so I can't say) is that Ciscos propagate the bad route and -then- drop
the session. But regardless...

> I guess the main question is what is considered an "error" - if the peer starts 
> obviously misbehaving, then yet, drop the peer. But don't drop the peer due to an 
> invalid prefix that most likely did not originate on that router - it would be much 
> better for the 'net as a whole to just drop the bad prefix and carry on. Maybe a 
> algorithm could be built in where the peer could be dropped if the number of bad 
> prefixes exceeds a set threshold...
> 
> In short, the "drop the session when you get a bad prefix" only works its intended 
> purpose when every router that speaks BGP does this. If that can't be had, we 
> should really revisit the spec in that regard.

RFC1771:

6.  BGP Error Handling.

   This section describes actions to be taken when errors are detected
   while processing BGP messages.

   When any of the conditions described here are detected, a
   NOTIFICATION message with the indicated Error Code, Error Subcode,
   and Data fields is sent, and the BGP connection is closed.  If no
   Error Subcode is specified, then a zero must be used.

If the RFC states "drop the session when you get a bad prefix" then I
would like to think "every router that speaks BGP does this" could be
a safe expectation. I know, I know, there's a difference between "what
the spec says" and "what the product does", but isn't this the point
of RFC standards?

On the other hand, I suppose the argument could be made that the RFC
doesn't actually say "the BGP session is closed without the invalid
update being propagated to other peers". However, if your BGP engine
can detect an invalid update (which it can, if it is closing the
session), isn't it a given that it should know not to propagate said
update?

-c