RPKI race

Baldur Norddahl baldur.norddahl at gmail.com
Tue Jun 16 23:34:32 UTC 2020


I noticed that we regressed and started failing the test at
https://isbgpsafeyet.com/. Investigating I found that we apparently had
some routes in the validation state "unknown" that should have been either
invalid or valid. Including the test prefix which was received via NL-IX
(and Cogent on IPv6).

We do however have plenty of prefixes that are validated and received from
the same sources.

This is a Juniper MX204 router running 20.1R1.11. I tried a few things
including "clear bgp neighbor xxx soft-inbound" (supposed to rerun the
import policy where RPKI marking and check happens) which did not fix it.
Doing a "clear bgp neighbor xxx", which disconnects the peer and reconnects
after a slight delay, did however fix the issue. But I have to do that for
every peer we received the prefix from and potentially we could have
trouble with every peer we have :-(

This router was software upgraded and rebooted two days ago. I suspect a
race condition. What if the router started BGP sessions before it was able
to communicate with the RPKI validation server or before the RPKI database
was synchronized?

I find it a bit disappointing that we this easily ended up with a bad
validation state and apparently there is little I can do about it, except
for walking through all our peers and BGP reset them. Which frankly is an
unacceptable disruption of traffic flow.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20200617/a3ede31b/attachment.html>

More information about the NANOG mailing list