route converge time

Sat Nov 21 15:14:25 UTC 2015

Hey,

This is a complex problems and there are quite a few parts to consider.

Let's assume you want to optimize how fast you choose the right best exit 
after a failure. The opposite ( how fast the internet chooses the best entry 
point into your network after a failure ) is usually not that easy to 
influence.

The first component of our total convergence time is how fast you can actually 
detect the failure. If your bgp speaker is directly connected to the transit's 
bgp speaker with no boxes inbetween, then you can detect the failure about as 
fast as it takes your end to detect that the link is down, which is usually 
pretty fast ( you could tune the carrier-delay if you want to ). If there are 
any other boxes in-between , you can't rely on that. The best solution in that 
case, imho, would be to use bfd. If you can't do that, you may want try and 
tune bgp keepalive/holddown timers. Keep in mind that running aggressive 
timers will consume cpu resources on both your and the provider's end.

The second component would be how much time it takes bgp to find the alternate 
routes. As you're using l3vpn , there's an easy trick to apply here. You can 
just set up a different rd on each router and both routers will end up with 
routes from both providers in their bgp table. That will obviously consume 
hardware resources ( usually ram, as not every route will make it to the fib 
just yet ) so make sure your routers can handle it.

The third component would be how much time it takes you to update the fib 
itself. This is usually fast for a single route, but not as fast as you might 
think for ~550k routes. What you can do to speed this up depends somewhat on 
your hardware. Most big vendors do support some flavor of a hierarchical fib 
( cisco calls theirs pic core ). Keep in mind that this will also eat up 
hardware resources depending on the implementation itself. Make sure you read 
up before you try anything as it could end up doubling your fib requirements, 
which aren't light to begin with for full tables.

Last but not least, keep scalabity in mind when reading the last 2 paragraphs. 
On newer boxes, tuning for fast convergence may be more than fine for 2 
providers but practically impossible for, say, 6 or 8 of them.

As for the scenarios of local failure, first of all, really try to make sure 
that the ibgp session between them ( or towards their RRs/etc ) is as robust 
as it gets. Assuming that's taken care of, convergence should be about as much 
time as it takes your igp to figure it out. Bfd and usual igp timer/feature 
adjustments do apply. Next-hop tracking and fast peering detection ( assuming 
cisco ) are also nice, though if you have defaults in your network, you might 
want to exclude them from being used for either.

My thoughts and words are my own.

Kind Regards,

Spyros

-----Original Message-----
From: NANOG [mailto:nanog-bounces at nanog.org] On Behalf Of Baldur Norddahl
Sent: Saturday, November 21, 2015 3:45 PM
To: nanog at nanog.org
Subject: route converge time

Hi

I got a network with two routers and two IP transit providers, each with the 
full BGP table. Router A is connected to provider A and router B to provider 
B. We use MPLS with a L3VPN with a VRF called "internet".
Everything happens inside that VRF.

Now if I interrupt one of the IP transit circuits, the routers will take 
several minutes to remove the now bad routes and move everything to the 
remaining transit provider. This is very noticeable to the customers. I am 
looking into ways to improve that.

I added a default static route 0.0.0.0 to provider A on router A and did the 
same to provider B on router B. This is supposed to be a trick that allows the 
network to move packets before everything is fully converged.
Traffic might not leave the most optimal link, but it will be delivered.

Say I take down the provider A link on router A. As I understand it, the 
hardware will notice this right away and stop using the routes to provider A. 
Router A might know about the default route on router B and send the traffic 
to router B. However this is not much help, because on router B there is no 
link that is down, so the hardware is unaware until the BGP process is done 
updating the hardware tables. Which apparently can take several minutes.

My routers also have multipath support, but I am unsure if that is going to be 
of any help.

Anyone got any tricks or pointers to what can be done to optimize the downtime 
in case of a IP transit link failure? Or the related case of one my routers 
going down or the link between them going down (the traffic would go a 
non-direct way instead if the direct link is down).

Thanks,

Baldur
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4829 bytes
Desc: not available
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20151121/f8806db2/attachment.bin>