Soliciting your opinions on Internet routing: A survey on BGP convergence
mike at mikejones.in
Tue Jan 10 21:31:20 UTC 2017
On 10 January 2017 at 19:58, Job Snijders <job at instituut.net> wrote:
> On Tue, Jan 10, 2017 at 03:51:04AM +0100, Baldur Norddahl wrote:
>> If a transit link goes, for example because we had to reboot a router,
>> traffic is supposed to reroute to the remaining transit links.
>> Internally our network handles this fairly fast for egress traffic.
>> However the problem is the ingress traffic - it can be 5 to 15 minutes
>> before everything has settled down. This is the time before everyone
>> else on the internet has processed that they will have to switch to
>> your alternate transit.
>> The only solution I know of is to have redundant links to all transits.
> Alternatively, if you reboot a router, perhaps you could first shutdown
> the eBGP sessions, then wait 5 to 10 minutes for the traffic to drain
> away (should be visible in your NMS stats), and then proceed with the
> Of course this only works for planned reboots, not suprise reboots.
> Kind regards,
If I tear down my eBGP sessions the upstream router withdraws the
route and the traffic just stops. Are your upstreams propagating
withdraws without actually updating their own routing tables?
I believe the simple explanation of the problem can be seen by firing
up an inbound mtr from a distant network then withdrawing the route
from the path it is taking. It should show either destination
unreachable or a routing loop which "retreats" (under the right
circumstances I have observed it distinctly move 1 hop at a time)
until it finds an alternate path.
My observed convergence times for a single withdraw are however in the
sub-10 second range, to get all the networks in the original path
pointing at a new one. My view on the problem is that if you are
failing over frequently enough for a customer to notice and report it,
you have bigger problems than convergence times.
- Mike Jones
More information about the NANOG