Reactive RPKI ROV (Was: Hurricane Electric has reached 0 RPKI INVALIDs)
job at ntt.net
Tue Jun 16 20:07:22 UTC 2020
Dear Mike, Ytti, others,
First of all and most importantly: congratulations Mike! I thank you and
your team for having constructed a great mechanism that helps honor the
routing intentions everyone publishes in the RPKI.
On Tue, Jun 16, 2020 at 09:08:41AM +0300, Saku Ytti wrote:
> On Tue, 16 Jun 2020 at 07:51, Mike Leber via NANOG <nanog at nanog.org> wrote:
> > These prefix filters are updated automatically both through a system
> > of daily updates and real time updates to prevent RPKI INVALID
> > routes from being carried in our routing table.
> What does real time mean in this context? Does it mean exactly 0s leak
> of INVALID, or 99% less than 30s? Or how do you define it?
My measurement (samplesize = 1) appears to indicate it took less than a
minute between AS 6939 receiving (and accepting) an RPKI invalid route
announcement, and that same route announcement being removed from the AS
6939 routing tables. Subsequently BGP withdraw messages were sent (for
that RPKI invalid route via 6939) to all their peers, which a few more
minutes to be processed and converge in the global routing system.
I think it is important for the community to understand that the
mechanism 6939 currently uses, is a different approach to what other
network operators are doing.
Most RPKI ROV deployments have set it up in such a way that a-priori all
EBGP routers are primed with a full set of VRPs. Feeding the routers the
VRPs through the RPKI-To-Router (RTR) protocol allows those BGP speakers
to reject an RPKI invalid route - before - installing it in the Loc-RIB.
At the same time, we should recognize and praise anyone who managed to
deploy a reactive mechanism due to the lack of RTR support on a device.
The "route collector -> script -> add prefix list to denylist" approach
cannot be avoided if you have gear in the network that does not support
RPKI OV as specced out in RFC 6811.
The reactive mechanism must be viewed in context of other protection
mechanisms that are deployed such as Peerlock, Maximum Prefix Limits,
and IRR+RPKI+WHOIS based explicit allowlists, all of which 6939 has
done. I actually had to jump through some hoops in the IRR system to
trick 6939 into accepting my RPKI invalid route announcement. :-)
Since it is with words that we construct the magic of our reality, let's
assign a name specific to this engineering effort:
Reactive RPKI ROV
Reactive RPKI ROV means that a network operator has set up a
RPKI-capable route collector which peers with all BGP nodes that do not
support RPKI. The route collector logs all RPKI route announcements it
receives, and these messages can be used as input to an automated
process to update prefix-list filters on the BGP node that received the
RPKI invalid route announcement. The free OpenBGPD or BIRD software can
be used as such route collectors. As is evident from my 'samplesize=1'
study, that whole process can be completed in under one minute.
The alternative to the "Reactive RPKI ROV" approach is what we've
already done for years: emailing a NOC and request manual intervention
to block a problematic route. At the best of times the 'calling the NOC'
approach takes hours. As such, Reactive RPKI ROV is obviously far
preferable to manual approaches.
It would be awesome if the community openly shares notes on how to
construct Reactive RPKI ROV deployments to improve routing for everyone.
Maybe at some point some open source software pops up somewhere to make
it easier for everyone? The future is bright, I'm optimistic we tame the
Default-Free Zone beast :)
So Mike, please consider to submit a presentation proposal to one of the
network operator groups to outline in as much detail as possible how you
did it. I'd love to learn from your experience!
> So my definition of real time here would be 99% <5min.
I think it should be 99% <1 min, because that's how high 6939 set the bar :-)
More information about the NANOG