BGP convergence problem

Matthew Petach mpetach at netflight.com
Tue Jun 8 16:26:47 UTC 2010


On Tue, Jun 8, 2010 at 7:27 AM, Andy B. <globichen at gmail.com> wrote:
> I finally decided to shut down all peerings and brought them back one by one.
>
> Everything is stable again, but I don't like the way I had to deal
> with it since it will most likely happen again when DECIX or an other
> IX we're at is having issues.
>
> I've seen a few BGP convergence discussions on NANOG, but none about
> deadlock situations and what could be done to avoid them. Setting
> higher MTU or bigger hold queues did not help.
>
> - Andy

Some people have found that upgrading to an alternate router vendor
helps.  ^_^;

Fundamentally, the CPU on your router is underpowered for the amount
of state information that needs to be updated in the time window of the
hold timers.  If you can't move to a faster/more efficient platform, then
you may need to negotiate raising the keepalive interval and corresponding
hold timers with your neighbors, to give your router time to finish processing
updates.

Alternately, if you aren't in a position to be able to upgrade platforms, but
have spare routers around, connecting a second router up to the exchange
and splitting your neighbors up among two links into the exchange would
reduce the load on each router during reconvergence, and buy you time
until you can move to a more capable platform.

Matt




More information about the NANOG mailing list