Peering/Transit eBGP sessions -pet or cattle?

Lukas Tribus lists at ltri.eu
Mon Feb 10 23:33:37 UTC 2020


Hello Baldur,


On Mon, 10 Feb 2020 at 19:57, Baldur Norddahl <baldur.norddahl at gmail.com> wrote:
> Many dual homed companies may start out with two routers and two
> transits but without dual links to each transit, as you describe
> above. That will cause significant disruption if one link goes
> down. It is not just about convergence between T1 and T2 but for
> a major part of the internet. Been there, done that, yes you can
> be down for up to several minuttes before everything is normal
> again. Assume tier 1 transits and that contact to T1 was lost.
> This means T1 will have a peering session with T2 somewhere,
> but T1 will not allow peer to peer traffic to go via that link.
> All those peers will need to search for a different way to reach
> you, a way that does not transit T1 (unless they have a contract
> with T1).
>
> Therefore, if being down for several minutes is not ok, you
> should invest in dual links to your transits. And connect those
> to two different routers. If possible with a guarantee the
> transits use two routers at their end and that divergent fiber
> paths are used etc.

That is not my experience *at all*. I have always seen my prefixes
converge in a couple of seconds upstream (vs 2 different Tier1's).
That is with a double-digit number of announcements. Maybe if you
announce tens of thousands of prefixes as a large Tier 2, things are
more problematic, that I can't tell. Or maybe you hit some old-school
route dampening somewhere down the path. Maybe there is another reason
for this. But even if 3 AS hops are involved I don't really understand
how they would spend *minutes* to converge after receiving your BGP
withdraw message.

When I saw *minutes* of brownouts in connectivity it was always
because of ingress prefix convergence (or the lack thereof, due to
slow FIB programing, then temporary internal routing loops, nasty
things like that, but never external).

I agree there are a number of reasons (including best convergence) to
have completely diversified connections to a single transit AS.
Another reason is that when you manually reroute traffic for a certain
AS path (say transit 2 has an always congested PNI towards a third
party ASN), you may not have an alternative to the congested path when
you other transit provider goes away. But I never saw minutes of
brownout because of upstream -> downstream -> downstream convergence
(or whatever the scenario looks like).


lukas



More information about the NANOG mailing list