Peering/Transit eBGP sessions -pet or cattle?
adamv0025 at netconsultings.com
adamv0025 at netconsultings.com
Thu Feb 13 09:49:00 UTC 2020
> Baldur Norddahl
> Sent: Wednesday, February 12, 2020 7:57 PM
>
> On Tue, Feb 11, 2020 at 12:33 AM Lukas Tribus <mailto:lists at ltri.eu> wrote:
> > Therefore, if being down for several minutes is not ok, you should
> > invest in dual links to your transits. And connect those to two
> > different routers. If possible with a guarantee the transits use two
> > routers at their end and that divergent fiber paths are used etc.
>
> That is not my experience *at all*. I have always seen my prefixes
> converge in a couple of seconds upstream (vs 2 different Tier1's).
>
> This is a bit old but probably still thus:
>
> https://labs.ripe.net/Members/vastur/the-shape-of-a-bgp-update
>
> Quote: "To conclude, we observe that BGP route updates tend to
> converge globally in just a few minutes. The propagation of newly
> announced prefixes happens almost instantaneously, reaching 50%
> visibility in just under 10 seconds, revealing a highly responsive
> global system. Prefix withdrawals take longer to converge and generate
> nearly 4 times more BGP traffic, with the visibility dropping below 10% only after approximately 2 minutes".
>
> Unfortunately they did not test the case of withdrawal from one router
> while having the prefix still active at another.
>
Yes that's unfortunate,
Although I'm thinking that the convergence time would be highly dependent on the first-hop upstream providers involved in the "local-repair" for the affected AS -once that is done doesn't matter that the whole world still routes traffic to affected AS towards the original first-hop upstream AS, as long as it has a valid detour route.
And I guess the topology configuration of this first-hop outskirt from the affected AS involved in the "local-repair" would dictate the convergence time.
E.g. if your upstream A box happens to have a direct (usable) link/session to upstream B box -winner, however the higher the number of boxes involved in the "local-repair" detour that need to be told "A no more, now B is the way to go" the longer the convergence time.
-but if significant portion of the Internet gets withdraw in 2 min -wondering how long could it be for a typical "local-repair" string of bgp speakers to all get the memo.
-but realistically how many bgp speakers could that be, ranging from min 2 - to max... say ~6?
>
> When I saw *minutes* of brownouts in connectivity it was always
> because of ingress prefix convergence (or the lack thereof, due to
> slow FIB programing, then temporary internal routing loops, nasty
> things like that, but never external).
>
> That is also a significant problem. In the case of a single transit
> connection per router, two routers and two providers, there will be a
> lot of internal convergence between your two routers in the case of a
> link failure. That is also avoided by having both routers having the same provider connections.
> That way a router may still have to invalidate many routes but there
> will be no loops and the router has loop free alternatives loaded into
> memory already (to the other provider). Plus you can use the simple
> trick of having a default route as a fall back.
>
This is a very good point actually, indeed since the box has two transit sessions in case of a failure of only one of them it will still retain all the prefixes in FIB -it will just need to reprogram few next-hops to point towards the other eBGP/iBGP speakers, whoever offers a best path. And reprograming next-hops is significantly faster (with hierarchical FIBs anyways).
adam
More information about the NANOG
mailing list