Destination Preference Attribute for BGP

Mark Tinka mark at tinka.africa
Fri Aug 18 21:33:28 UTC 2023



On 8/18/23 22:40, Matthew Petach wrote:

>
> Hi Robert,
>
> Without naming any names, I will note that at some point in the 
> not-too-distant past, I was part of a new-years-eve-holiday-escalation 
> to $BACKBONE_ROUTER_PROVIDER when the global network I was involved 
> with started seeing excessive convergence times (greater than one hour 
> from BGP update message received to FIB being updated).
> After tracking down development engineer from $RTR_PROVIDER on the new 
> years eve holiday, it was determined that the problem lay in 
> assumptions made about how communities were stored in memory.  Think 
> hashed buckets, with linked lists within each bucket.  If the 
> communities all happened to hash to the same bucket, the linked list 
> in that bucket became extremely long; and if every prefix coming in, 
> say from multiple sessions with a major transit provider, happened to 
> be adding one more community to the very long linked list in that one 
> hash bucket, well, it ended up slowing down the processing to the 
> point where updates to the FIB were still trickling in an hour after 
> the BGP neighbor had finished sending updates across.
>
> A new hash function was developed on New Year's day, and a new version 
> of code was built for us to deploy under relatively painful 
> circumstances.
>
> It's easy to say "Considering that we are talking about control 
> plane memory I think the cost/space associated with storing 
> communities is less then negligible these days."
> The reality is very different, because it's not just about efficiently 
> *storing* communities, it's really about efficiently *parsing and 
> updating* communities--and the choices made there absolutely *DO* 
> "contribute to longer protocol convergences in any measurable way."
>
> Matt
> (the names have been obscured to increase my chances of being hireable 
> in the industry again at some future date. ;)

To be fair, you are talking about an arbitrary value of years back, on 
boxes you don't name running code you won't mention.

This really not saying much :-).

Corner cases, while valid, do not speak to the majority. If this was a 
major issue, there would have been more noise about it by now.

There has been quite some noise about lengthy AS_PATH updates that bring 
some routers down, which has usually been fixed with improved BGP code. 
But even those are not too common, if one considers a 365-day period.

Mark.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20230818/21b3dd29/attachment.html>


More information about the NANOG mailing list