Peering/Transit eBGP sessions -pet or cattle?

Thu Feb 13 09:49:00 UTC 2020

> Baldur Norddahl
> Sent: Wednesday, February 12, 2020 7:57 PM
> 
> On Tue, Feb 11, 2020 at 12:33 AM Lukas Tribus <mailto:lists at ltri.eu> wrote:
> > Therefore, if being down for several minutes is not ok, you should 
> > invest in dual links to your transits. And connect those to two 
> > different routers. If possible with a guarantee the transits use two 
> > routers at their end and that divergent fiber paths are used etc.
> 
> That is not my experience *at all*. I have always seen my prefixes 
> converge in a couple of seconds upstream (vs 2 different Tier1's).
> 
> This is a bit old but probably still thus:
> 
> https://labs.ripe.net/Members/vastur/the-shape-of-a-bgp-update
> 
> Quote: "To conclude, we observe that BGP route updates tend to 
> converge globally in just a few minutes. The propagation of newly 
> announced prefixes happens almost instantaneously, reaching 50% 
> visibility in just under 10 seconds, revealing a highly responsive 
> global system. Prefix withdrawals take longer to converge and generate 
> nearly 4 times more BGP traffic, with the visibility dropping below 10% only after approximately 2 minutes".
> 
> Unfortunately they did not test the case of withdrawal from one router 
> while having the prefix still active at another.
> 
Yes that's unfortunate,
Although I'm thinking that the convergence time would be highly dependent on the first-hop upstream providers involved in the "local-repair" for the affected AS -once that is done doesn't matter that the whole world still routes traffic to affected AS towards the original first-hop upstream AS, as long as it has a valid detour route.
And I guess the topology configuration of this first-hop outskirt from the affected AS involved in the "local-repair" would dictate the convergence time.
E.g. if your upstream A box happens to have a direct (usable) link/session to upstream B box -winner, however the higher the number of boxes involved in the "local-repair" detour that need to be told "A no more, now B is the way to go" the longer the convergence time.
-but if significant portion of the Internet gets withdraw in 2 min -wondering how long could it be for a typical "local-repair" string of bgp speakers to all get the memo.
-but realistically how many bgp speakers could that be, ranging from min 2 - to max... say ~6? 
   

> 
> When I saw *minutes* of brownouts in connectivity it was always 
> because of ingress prefix convergence (or the lack thereof, due to 
> slow FIB programing, then temporary internal routing loops, nasty 
> things like that, but never external).
> 
> That is also a significant problem. In the case of a single transit 
> connection per router, two routers and two providers, there will be a 
> lot of internal convergence between your two routers in the case of a 
> link failure. That is also avoided by having both routers having the same provider connections.
> That way a router may still have to invalidate many routes but there 
> will be no loops and the router has loop free alternatives loaded into 
> memory already (to the other provider). Plus you can use the simple 
> trick of having a default route as a fall back.
> 
This is a very good point actually, indeed since the box has two transit sessions in case of a failure of only one of them it will still retain all the prefixes in FIB -it will just need to reprogram few next-hops to point towards the other eBGP/iBGP speakers, whoever offers a best path. And reprograming next-hops is significantly faster (with hierarchical FIBs anyways).

adam