plea for comcast/sprint handoff debug help
morrowc.lists at gmail.com
Fri Nov 6 06:26:46 UTC 2020
I hate to jump in late. but... :)
After reading this a few times it seems like what's going on is:
o a set of assumptions were built into the software stack
this seems fine, hard to build with some assumptions :)
o the assumptions seem to include: "if rrdp fails <how?> feel free
to jump back/to rsync"
I think SOME of the problem is the 'how' there.
Admittedly someone (randy) injected a pretty pathological failure
mode into the system
and didn't react when his 'monitoring' said: "things are broke yo!"
o absent a 'failure' the software kept on getting along as it had before.
Afterall, maybe the operator here intentionally put their
repository into this whacky state?
How is an RP software stack supposed to know what the PP's
management is meaning to do?
o lots of debate about how we got to where we are, I don't know that
much of it is really helpful.
I think a way forward here is to offer a suggestion for the software
folk to cogitate on and improve?
"What if (for either rrdp or rsync) there is no successful
update in X of Y attempts,
attempt the other protocol to sync down to bring the remote PP back
to life in your local view."
This both allows the RP software to pick their primary path (and stick
to that path as long as things work) AND
helps the PP folk recover a bit quicker if their deployment runs into troubles.
0: I think 'failure' here is clear (to me):
1) the protocol is broken (rsync no connect, no http connect)
2) the connection succeeds but there is no sync-file (rrdp) nor
The 6486-bis rework effort seems to be getting to: "No MFT? no CRL?
you r busted!"
so I think if you don't get MFT/CRL in X of Y attempts it's safe to
say the PP over that protocol is busted,
and attempting the other proto is acceptable.
On Mon, Nov 2, 2020 at 4:37 AM Job Snijders <job at ntt.net> wrote:
> On Mon, Nov 02, 2020 at 09:13:16AM +0100, Tim Bruijnzeels wrote:
> > On the other hand, the fallback exposes a Malicious-in-the-Middle
> > replay attack surface for 100% of the prefixes published using RRDP,
> > 100% of the time. This allows attackers to prevent changes in ROAs to
> > be seen.
> This is a mischaracterization of what is going on. The implication of
> what you say here is that RPKI cannot work reliably over RSYNC, which is
> factually incorrect and an injustice to all existing RSYNC based
> deployment. Your view on the security model seems to ignore the
> existence of RPKI manifests and the use of CRLs, which exist exactly to
> mitigate replays.
> Up until 2 weeks ago Routintar indeed was not correctly validating RPKI
> data, fortunately this has now been fixed:
> Also via the RRDP protocol old data be replayed, because because just
> like RSYNC, the RRDP protocol does not have authentication. When RPKI
> data is transported from Publication Point (RP) to Relying Party, the RP
> cannot assume there was an unbroken 'chain of custody' and therefor has
> to validate all the RPKI signatures.
> For example, if a CDN is used to distribute RRDP data, the CDN is the
> MITM (that is literally what CDNs are: reverse proxies, in the middle).
> The CDN could accidentally serve up old (cached) content or misserve
> current content (swap 2 filenames with each other).
> > This is a tradeoff. I think that protecting against replay should be
> > considered more important here, given the numbers and time to fix
> > HTTPS issue.
> The 'replay' issue you perceive is also present in RRDP. The RPKI is a
> *deployed* system on the Internet and it is important for Routinator to
> remain interopable with other non-nlnetlabs implementations.
> Routinator not falling back to rsync does *not* offer a security
> advantage, but does negatively impact our industry's ability to migrate
> to RRDP. We are in 'phase 0' as described in Section 3 of
More information about the NANOG