Starting to Drop Invalids for Customers

Lukas Tribus lists at ltri.eu
Fri Jan 10 14:15:44 UTC 2020


Hello Mark,


On Fri, 10 Jan 2020 at 13:39, Mark Tinka <mark.tinka at seacom.mu> wrote:
>
> So just an update on this.
>
> We've since completed the roll-out of dropping Invalids on eBGP sessions with customers as well.
>
> It also included some Cisco ME3600X routers that will ultimately be replaced this year by Cisco ASR920 routers.

Thanks for sharing all this. Regarding those 2 platforms specifically,
what release are you using here that does not blow up? IIRC you had
some RPKI related crash bugs at some point in time?


> In IOS XE, all iBGP routes are marked as Valid by default. This is not a big problem in practice,
> however, because all eBGP points are checked for RPKI state, and anything marked as Invalid
> is dropped. So whatever will appear in the iBGP would have already been scraped. Of course,
> IOS XE doing this is not ideal at all, and they are breaking the RFC mandate, but it doesn't
> cause any real harm.

Apparently though there are real life issues with this, specifically when:

- there is no ROA, so prefixes are supposed to be UNKNOWN on all nodes
- but IOS-XE prefers VALID over UNKNOWN (changing best path selection)
- iBGP is *always* VALID (even if it's really UNKNOWN), eBGP is
showing UNKNOWN, so iBGP is preferred over eBGP which breaks a lot of
assumptions and "hot potato" concepts (possible temporary routing
loops, other than of course different egress behavior)

Here's a blog post about this:
http://schoolsysadmin.blogspot.com/2019/07/securing-internet-routing-rpki-ov-and.html

Apparently there is an IOS feature "Announce RPKI Validation State to
Neighbors" to transmit the *real* RPKI state in iBGP (so as opposed to
defaulting to VALID for all iBGP neighbors), I'm not sure if that
fixes this problem or not. It doesn't really address the root cause
(which is: unwanted and not configurable interference with the best
path selection algorithm) - but at it can at least hide it's symptoms.

RPKI implementations should not touch best path selection. Dropping
RPKI invalids is the real use-case here, and if someones wants to
loc-pref based on RPKI status we should allow it (even if it doesn't
make a lot of sense), but having the RPKI implementation intervene in
the best path selection without the possibility to disable it is ...
frustrating.


How much do you rely on "hot potato" routing for peers/transit and
customers? How does that work for you with RPKI unkowns?


Thanks for sharing your experiences,

Lukas



More information about the NANOG mailing list