Peering/Transit eBGP sessions -pet or cattle?

Baldur Norddahl baldur.norddahl at gmail.com
Mon Feb 10 18:57:00 UTC 2020


On Mon, Feb 10, 2020 at 5:42 PM <adamv0025 at netconsultings.com> wrote:

>
> > To be explicit: Router R1 has connections to transits T1 and T2.
> > Router R2 also has connections to the same transits T1 and T2. When
> > router R1 goes down, only small internal changes at T1 and T2 happens.
> > Nobody notices and the recovery is sub second.
> >
> Good point again,
> Though if I had only T1 on R1 and only T2 on R2 then convergence won't
> happen inside each Transit but instead between T1 and T2 which will add to
> the convergence time.
> So thinking about it seems the optimal design pattern in a distributed
> (horizontally scaled out) edge would be to try and pair up -i.e. at least
> two edge nodes per Transit (or Peer for that matter), in order to allow for
> potentially faster intra-Transit convergence rather than arguably slower
> inter-transit convergence.
>
>
 I am assuming R1 and R2 are connected and announcing the same routes. Each
transit is therefore receiving the same routes from two independent routers
(R1 and R2). When R1 goes down, something internally at the transit will
change to reflect that. But peers, other customers at that transit and
higher tier transits will see no difference at all. Assuming R1 and R2 both
announce a default route internally in your network, your internal
convergence will be as fast as your detection of the dead router.

This scheme also protects against link failure or failure at the provider
end (if you make sure the transit is also using two routers).

Therefore even if R1 and R2 are in the same physical location, maybe the
same rack mounted on top of each other, that is a better solution than one
big hunky router with redundant hardware. Having them at different
locations is better of course but not always feasible.

Many dual homed companies may start out with two routers and two transits
but without dual links to each transit, as you describe above. That will
cause significant disruption if one link goes down. It is not just about
convergence between T1 and T2 but for a major part of the internet. Been
there, done that, yes you can be down for up to several minuttes before
everything is normal again. Assume tier 1 transits and that contact to T1
was lost. This means T1 will have a peering session with T2 somewhere, but
T1 will not allow peer to peer traffic to go via that link. All those peers
will need to search for a different way to reach you, a way that does not
transit T1 (unless they have a contract with T1).

Therefore, if being down for several minutes is not ok, you should invest
in dual links to your transits. And connect those to two different routers.
If possible with a guarantee the transits use two routers at their end and
that divergent fiber paths are used etc.

Regards,

Baldur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20200210/6e77be93/attachment.html>


More information about the NANOG mailing list