Soliciting your opinions on Internet routing: A survey on BGP convergence

Baldur Norddahl baldur.norddahl at gmail.com
Tue Jan 10 02:51:04 UTC 2017


Hello

I find that the type of outage that affects our network the most is 
neither of the two options you describe. As is probably typical for 
smaller networks, we do not have redundant uplinks to all of our 
transits. If a transit link goes, for example because we had to reboot a 
router, traffic is supposed to reroute to the remaining transit links. 
Internally our network handles this fairly fast for egress traffic.

However the problem is the ingress traffic - it can be 5 to 15 minutes 
before everything has settled down. This is the time before everyone 
else on the internet has processed that they will have to switch to your 
alternate transit.

The only solution I know of is to have redundant links to all transits. 
Going forward I will make sure we have this because it is a huge 
disadvantage not being able to take a router out of service without 
causing downtime for all users. Not to mention that a router crash or 
link failure that should have taken seconds at most to reroute, but 
instead causes at least 5 minutes of unstable internet.

Regards,

Baldur


Den 09/01/2017 kl. 23.56 skrev Laurent Vanbever:
> Hi NANOG,
>
> We often read that the Internet (i.e. BGP) is "slow to converge". But how slow
> is it really? Do you care anyway? And can we (researchers) do anything about it?
> Please help us out to find out by answering our short anonymous survey
> (<10 minutes).
>
> Survey URL: https://goo.gl/forms/JZd2CK0EFpCk0c272 <https://goo.gl/forms/WW7KX5kT45m6UUM82>
>
>
> ** Background:
>
> While existing fast-reroute mechanisms enable sub-second convergence upon
> local outages (planned or not), they do not apply to remote outages happening
> further away from your AS as their detection and protection mechanisms only
> work locally.
>
> Remote outages therefore mandate a "BGP-only" convergence which tends to be
> slow, as long streams of BGP UPDATEs (containing up to 100,000s of them) must
> be propagated router-by-router. Our initial measurements indicate that it can
> take state-of-the-art BGP routers dozens of seconds to process and propagate
> these large streams of BGP UPDATEs. During this time, traffic for important
> destinations can be lost.
>
>
> ** This survey:
>
> This survey aims at evaluating the impact of slow BGP convergence on
> operational practices. We expect the findings to increase the understanding of
> the perceived BGP convergence in the Internet, which could then help
> researchers to design better fast-reroute mechanisms.
>
> We expect the questionnaire to be filled out by network operators whose job relates
> to BGP operations. It has a total of 17 questions and should take less 10 minutes
> to answer. The survey and the collected data are anonymous (so please do *not*
> include information that may help to identify you or your organization).
> All questions are optional, so if you don't like a question or don't know the answer,
> please skip it.
>
> A summary of the aggregate results will be published as a part of a scientific
> article later this year.
>
> Thank you so much in advance, and we look forward to read your responses!
>
>
> Laurent Vanbever (ETH Zürich, Switzerland)
>
>
> PS: It goes without saying that we would be also extremely grateful if you could
> forward this email to any operator you might know who may not read NANOG.




More information about the NANOG mailing list