Destination Preference Attribute for BGP

Mark Tinka mark at tinka.africa
Sat Aug 19 05:15:40 UTC 2023



On 8/19/23 00:22, Matthew Petach wrote:

> Hi Mark,
>
> I know it's annoying that I won't mention specifics.
> Unfortunately, the last time I mentioned $vendor-specific information 
> on NANOG, it was picked up by the press, and turned into a 
> multimillion dollar kerfuffle with me at the center of the cross-hairs:
> https://www.google.com/search?q=petach+kablooie&sca_esv=558180114&nirf=petah+kablooie&filter=0&biw=1580&bih=1008&dpr=2 
> <https://www.google.com/search?q=petach+kablooie&sca_esv=558180114&nirf=petah+kablooie&filter=0&biw=1580&bih=1008&dpr=2>
>
> After that, I've learned it's best to not name specific very-big-name 
> vendors on NANOG posts.
>
> What I *can* say is that this was one of the primary vendors in the 
> Internet backbone space, running mainstream code.
> The only reason it didn't affect more networks was a function of the 
> particular cluster of signalling communities being applied to all 
> inbound prefixes, and how they interacted with the vendor's hash 
> algorithm.
>
>     Corner cases, while valid, do not speak to the majority. If this
>     was a major issue, there would have been more noise about it by now.
>
>
> I prefer to look at it the other way; the reason you didn't hear more 
> noise about it, is that we stubbed our toes on it early, and had 
> relatively fast, direct access to the development engineers to get it 
> fixed within two days.  It's precisely *bcause* people trip over 
> corner cases and get them fixed that they don't end up causing more 
> widespread pain across the rest of the Internet.
>
>     There has been quite some noise about lengthy AS_PATH updates that
>     bring some routers down, which has usually been fixed with
>     improved BGP code. But even those are not too common, if one
>     considers a 365-day period.
>
>
> Oh, absolutely.  Bugs in implementations that either crash the router 
> or reset the BGP session are much more immediately visible than 
> "that's odd, it's taking my routers longer to converge than it should".
>
> How many networks actually track their convergence time in a time 
> series database, and look at unusual trends, and then diagnose why the 
> convergence time is increasing, versus how many networks just note an 
> increasing number of "hey, your network seems to be slowing down" and 
> throw more hardware at the problem, while grumbling about why their 
> big expensive routers seem to be less powerful than a *nix box running 
> gated?
>
> I suspect there's more of these type of "corner cases" out there than 
> you recognize.
> It's just that most networks don't dig into routing performance issues 
> unless it actually breaks the router, or kills BGP adjacencies.
>
> If you *are* one of the few networks that tracks your router's 
> convergence time over time, and identifies and resolves unexpected 
> increases in convergence time, then yes, you absolutely have standing 
> to tell me to pipe down and go back into my corner again.  ;D

So, while this all sounds good, without any specifics on vendor, box, 
code, code revision number, fix, year it happened, current status, 
e.t.c., I can't offer any meaningful engagement.

We all run into odd stuff as we operate this Internet, but the point of 
a list like this is to share those details so we can learn, fix and move 
forward.

Your ambiguity does not lend itself to a helpful discussion, 
notwithstanding my understanding of your caution.

I am less concerned about keeping smiles on vendors' faces. I tell them 
in public and private if they are great or not. But since you've been 
burned, I get. It's just not moving the needle on this thread, though.

Mark.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20230819/76816b26/attachment.html>


More information about the NANOG mailing list