Peering/Transit eBGP sessions -pet or cattle?

Baldur Norddahl baldur.norddahl at gmail.com
Mon Feb 10 15:06:14 UTC 2020


No matter how much money you put into your peering router, the session will
be no more stable that whatever the peer did to their end. Plus at some
point you will need to reboot due to software upgrade or other reasons. If
you care at all, you should be doing redundancy by having multiple
locations, multiple routers. You can then save the money spent on each
router, because a router failure will not cause any change on what the
internet sees through BGP.

Also transits are way more important than peers. Loosing a transit will
cause massive route changes around the globe and it will take a few
minutes to stabilize. Loosing a peer usually just means the peer switches
to the transit route, that they already had available.

Peers are not equal. You may want to ensure redundancy to your biggest
peers, while the small fish will be fine without.

To be explicit: Router R1 has connections to transits T1 and T2. Router R2
also has connections to the same transits T1 and T2. When router R1 goes
down, only small internal changes at T1 and T2 happens. Nobody notices and
the recovery is sub second.

Peers are less important: R1 has connection to internet exchange IE1 and R2
to a different internet exchange IE2. When R1 goes down the small peers at
IE1 are lost but will quickly reroute through transit. Large peers may be
present at both internet exchanges and so will instantly switch the traffic
to IE2.

Regards,

Baldur



On Mon, Feb 10, 2020 at 1:38 PM <adamv0025 at netconsultings.com> wrote:

> Hi,
>
>
>
> Would like to take a poll on whether you folks tend to treat your
> transit/peering connections (BGP sessions in particular) as pets or rather
> as cattle.
>
> And I appreciate the answer could differ for transit vs peering
> connections.
>
> However, I’d like to ask this question through a lens of redundant vs
> non-redundant Internet edge devices.
>
> To explain,
>
>    1. The “pet” case:
>
> Would you rather try improving the failure rate of your transit/peering
> connections by using resilient Control-Plane (REs/RSPs/RPs) or even
> designing these as link bundles over separate cards and optical modules?
>
> Is this on the bases that doesn’t matter how hard you try on your end
> (i.e. distribute your traffic to multitude of transit and peering
> connections or use BFD or even BGP-PIC Edge to shuffle thing around fast,
> any disruption to the eBGP session itself will still hurt you in some way,
> (i.e. at least some partial outage for some proportion of the traffic for
> not insignificant period of time) until things converge in direction from
> The Internet back to you.
>
>
>
>    1. The “cattle” case:
>
> Or would you instead rely on small-ish non-redundant HW at your internet
> edge rather than trying to enhance MTBF with big chassis full of redundant
> HW?
>
> Is this cause eventually the MTBF figure for a particular transit/peering
> eBGP session boils down to the MTBF of the single card or even single
> optical module hosting the link, (and creating bundles over separate cards
> -well you can never be quite sure how the setup looks like on the other end
> of that connection)?
>
> Or is it because the effects of a smaller/non-resilient border edge device
> failure is not that bad in your particular (maybe horizontally scaled)
> setup?
>
>
>
> Would appreciate any pointers, thank you.
>
> Thank you
>
>
>
> adam
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20200210/4ff57983/attachment.html>


More information about the NANOG mailing list