Destination Preference Attribute for BGP
Mark Tinka
mark at tinka.africa
Sat Aug 19 05:15:40 UTC 2023
On 8/19/23 00:22, Matthew Petach wrote:
> Hi Mark,
>
> I know it's annoying that I won't mention specifics.
> Unfortunately, the last time I mentioned $vendor-specific information
> on NANOG, it was picked up by the press, and turned into a
> multimillion dollar kerfuffle with me at the center of the cross-hairs:
> https://www.google.com/search?q=petach+kablooie&sca_esv=558180114&nirf=petah+kablooie&filter=0&biw=1580&bih=1008&dpr=2
> <https://www.google.com/search?q=petach+kablooie&sca_esv=558180114&nirf=petah+kablooie&filter=0&biw=1580&bih=1008&dpr=2>
>
> After that, I've learned it's best to not name specific very-big-name
> vendors on NANOG posts.
>
> What I *can* say is that this was one of the primary vendors in the
> Internet backbone space, running mainstream code.
> The only reason it didn't affect more networks was a function of the
> particular cluster of signalling communities being applied to all
> inbound prefixes, and how they interacted with the vendor's hash
> algorithm.
>
> Corner cases, while valid, do not speak to the majority. If this
> was a major issue, there would have been more noise about it by now.
>
>
> I prefer to look at it the other way; the reason you didn't hear more
> noise about it, is that we stubbed our toes on it early, and had
> relatively fast, direct access to the development engineers to get it
> fixed within two days. It's precisely *bcause* people trip over
> corner cases and get them fixed that they don't end up causing more
> widespread pain across the rest of the Internet.
>
> There has been quite some noise about lengthy AS_PATH updates that
> bring some routers down, which has usually been fixed with
> improved BGP code. But even those are not too common, if one
> considers a 365-day period.
>
>
> Oh, absolutely. Bugs in implementations that either crash the router
> or reset the BGP session are much more immediately visible than
> "that's odd, it's taking my routers longer to converge than it should".
>
> How many networks actually track their convergence time in a time
> series database, and look at unusual trends, and then diagnose why the
> convergence time is increasing, versus how many networks just note an
> increasing number of "hey, your network seems to be slowing down" and
> throw more hardware at the problem, while grumbling about why their
> big expensive routers seem to be less powerful than a *nix box running
> gated?
>
> I suspect there's more of these type of "corner cases" out there than
> you recognize.
> It's just that most networks don't dig into routing performance issues
> unless it actually breaks the router, or kills BGP adjacencies.
>
> If you *are* one of the few networks that tracks your router's
> convergence time over time, and identifies and resolves unexpected
> increases in convergence time, then yes, you absolutely have standing
> to tell me to pipe down and go back into my corner again. ;D
So, while this all sounds good, without any specifics on vendor, box,
code, code revision number, fix, year it happened, current status,
e.t.c., I can't offer any meaningful engagement.
We all run into odd stuff as we operate this Internet, but the point of
a list like this is to share those details so we can learn, fix and move
forward.
Your ambiguity does not lend itself to a helpful discussion,
notwithstanding my understanding of your caution.
I am less concerned about keeping smiles on vendors' faces. I tell them
in public and private if they are great or not. But since you've been
burned, I get. It's just not moving the needle on this thread, though.
Mark.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20230819/76816b26/attachment.html>
More information about the NANOG
mailing list