Setting sensible max-prefix limits

Tom Beecher beecher at beecher.cc
Wed Aug 18 15:51:01 UTC 2021


>
> Depending on what failure cases you actually see from your peers in the
> wild, I can see (at least as a thought experiment), a two-bucket solution -
> "transit" and "everyone else".  (Excluding downstream customers, who you
> obviously hold some responsibility for the hygiene of.)
>

Although I didn't say it clearly, that's exactly what we do. The described
'bucket' logic is only applied to the 'everyone else' pile ; our transit
stuff gets its own special care and feeding.

How often do folks see a failure case that's "deaggregated something and
> announced you 1000 /24s, rather than the expected/configured 100 max", vs
> "fat-fingered being a transit provider, and announced you the global table"?
>

I can count on one hand the number of times I can remember that a peer has
gone on a deagg party and ran over limits. Maybe twice in the last 8 years?
It's possible it's happened more that I'm not aware of.

We have additional protections in place for that second scenario. If a
generic peer tries to send us a route with a transit provider in the
as-path, we just toss the route on the floor. That protection has been much
more useful than prefix limits IMO.

On Wed, Aug 18, 2021 at 11:37 AM tim at pelican.org <tim at pelican.org> wrote:

> On Wednesday, 18 August, 2021 14:21, "Tom Beecher" <beecher at beecher.cc>
> said:
>
> > We created 5 or 6 different buckets of limit values (for v4 and v6 of
> > course.) Depending on what you have published in PeeringDB (or told us
> > directly what to expect), you're placed in a bucket that gives you a
> decent
> > amount of headroom to that bucket's max. If your ASN reaches 90% of your
> > limit, our ops folks just move you up to the next bucket. If you start to
> > get up there in the last bucket, then we'll take a manual look and decide
> > what is appropriate. This covers well over 95% of our non-transit
> sessions,
> > and has dramatically reduced the volume of tickets and changes our ops
> team
> > has had to sort through.
>
> Depending on what failure cases you actually see from your peers in the
> wild, I can see (at least as a thought experiment), a two-bucket solution -
> "transit" and "everyone else".  (Excluding downstream customers, who you
> obviously hold some responsibility for the hygiene of.)
>
> How often do folks see a failure case that's "deaggregated something and
> announced you 1000 /24s, rather than the expected/configured 100 max", vs
> "fat-fingered being a transit provider, and announced you the global table"?
>
> My gut says it's the latter case that breaks things and you need to make
> damn sure doesn't happen.  Curious to hear others' experience.
>
> Thanks,
> Tim.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20210818/848c6c6d/attachment.html>


More information about the NANOG mailing list