Persistent BGP peer flapping - do you care?
skh at nexthop.com
Thu Jan 17 21:14:10 UTC 2002
Thanks for the input. This is the revisit the
specification time. Just to confirm your
answer, I'll paraphrase it and let you know what happened.
the persistent bgp peer flapping
happens when you (one of the paths)
1) Error causes stop
(bad prefix --> drop connection)
2) BGP peer goes to IDLE state
3) Automatic restart happens (cisco doesn't utilize the
4) Open sent
6) error due to bad prefix still being sent
7) Idle Hold time (time delay here)
--> go back to #1
Specification says to slow down the cycle of
the establishing by increase the time delay
in step #7.
I think we are describing the same problem. Could you
At 03:10 PM 1/17/2002 -0500, Christopher A. Woodfield wrote:
>This has been bandied about before, but one should note that the "drop the
>peer if an error is received" is only really effective if the session that
>initiated the error does not propogate it. Most Cisco routers running
>common IOS images not only do not drop the session, but pass along the
>bad prefix, which
>leads to the occasional bad route dropping peering sessions on most of
>the Enterasys(*) routers on the planet.
Do the peering sessions drop once or repeatedly until
the bad prefix gets cleared out?
>I guess the main question is what is considered an "error" - if the peer
>starts obviously misbehaving, then yet, drop the peer. But don't drop the
>peer due to an invalid prefix that most likely did not ori0ginate on that
>router - it would be much better for the 'net as a whole to
> just drop the bad prefix and carry on. Maybe a
>algorithm could be built in where the peer could be dropped if the number
>prefixes exceeds a set threshold...
The algorithms for what constitutes a "drop" can be an implementation
detail or be specified as an optional portion of the next version
of the BGP specification.
>In short, the "drop the session when you get a bad prefix" only works its
>purpose when every router that speaks BGP does this. If that can't be had, we
>should really revisit the spec in that regard.
The specification says "recommended" (should) now and as we noted with
cisco, not all vendors implement it. We are documenting
existing practice so recommended/should will remain.
If you think it is a very serious operational issue, you
can always input to the idr mailing list that the "should" needs
to be "must" due to an operational issues.
Thanks again for answering the cry for help!
More information about the NANOG