Towards an RPKI-rich Internet (and the appropriate allocation of responsibility in the event an RIR RPKI CA outage)

Job Snijders job at ntt.net
Mon Oct 1 21:19:15 UTC 2018


Dear all,

I'm very happy to see the direction this conversation has taken, seems
we've moved on towards focussing on solutions and outcomes - this is
encouraging.

On Mon, Oct 01, 2018 at 05:44:17PM +0100, Nick Hilliard wrote:
> John Curran wrote on 01/10/2018 00:21:
> > There is likely some on the nanog mailing list who have a view on
> > this matter, so I pose the question of "who should be responsible"
> > for consequences of RPKI RIR CA failure to this list for further
> > discussion.
> 
> other replies in this thread have assumed that RPKI CA failure modes
> are restricted to loss of availability, but there are others failure
> modes, for example:
> 
> - fraud: rogue CA employee / external threat actor signs ROAs
> illegitimately
> 
> - negligence: CA accidentally signs illegitimate ROAs due to e.g.
> software bug
> 
> - force majeure: e.g. court orders CA to sign prefix with AS0,
> complicated by NIR RPKI delegation in jurisdictions which may have
> difficult relations with other parts of the world.
> 
> These types of situations are well-trodden territory for other types
> of PKI CA, where users
> 
> Otherwise, as other people have pointed out, catastrophic systems
> failure at the CA is designed to be fail-safe.  I.e. if the CA goes
> away, ROAs will be evaluated as "unknown" and life will continue on.
> If people misconfigure their networks and do silly things with this
> specific failure mode, that's their problem.  You can't stop people
> from aiming guns at their feet and pulling the trigger.

There are a number of failure modes and I believe the operational
community has yet to fully explore how to mitigate most risks. Over time
I expect we'll develop BCPs how to improve the robustness of the system;
these BCPs can only come into existence driven by actual operational
experierence.

A positive development that addresses some aspects of the concerns
raised is Certificate Transparency. Cloudflare set up a CT log
(https://groups.google.com/forum/#!topic/certificate-transparency/_deL5iGB5sY)
and I hope others like Google will also consider doing this. CT is a
great tool to help keep the roots perform in line with community
expectations.

I consider it the operator community's responsibility to figure out how
to deal with outages. I don't intend to hold the RIRs liable - we'll
need to learn to protect ourselves.

Kind regards,

Job



More information about the NANOG mailing list