2019-01-11 ARIN.NET DNSSEC Outage – Post-Mortem (was: Re: ARIN NS down?)

John Curran jcurran at arin.net
Fri Jan 11 20:59:10 UTC 2019


On 11 Jan 2019, at 10:39 AM, John Curran <jcurran at arin.net<mailto:jcurran at arin.net>> wrote:

On Fri, Jan 11, 2019 at 07:57:25PM +0530,
couldn't get address for 'ns1.arin.net<http://ns1.arin.net/>': not found

Folks -

   This has been resolved - arin.net<http://arin.net/> zone is again correctly signed.

Post-mortem forthcoming,

Folks -

The ARIN.NET<http://ARIN.NET> zone on our public signed DNS servers are populated via an internal DNS server and associated workflow.  As part of system maintenance near the end of 2018, the zone file used by the master internal DNS server was updated incorrectly, resulting in an invalid zone file.  Since the zone file was invalid, the zone did not reload on our internal master, and the associated workflow to DNSSEC sign and push this zone to the public servers did not execute.  Our monitoring systems reported being green until the signatures expired as they presently check that the SOA's match on the internal and external nameservers.

At approximately 8:30AM eastern time today (11 January 2019), ARIN operations started seeing issues within its monitoring.   Initial review suggested the problem was DNSSEC-related due to expired signatures.  We pulled the DS record from the zone so that DNSSEC validation would not be performed by those validating resolvers that had not already cached our DS records. Upon further investigation we determined that it was the result of human error in editing a zone file that went undetected and resulted in interruption of our routine zone publication process.  The issue was fixed and signed zones where then pushed out at 10:25 AM ET.  The DS record was reinstated in the parent at 10:30AM ET.

As a result of this incident, we will add additional alerting to the zone loading process for any errors and perform monitoring of zone signature lifetimes, with appropriate alerting for any potential expiration of DNSSEC signatures.

My apologies for this incident – while ARIN does have some fragility in our older systems (which we have been working aggressively to phase out via system refresh and replacements), it is not acceptable to have this situation with key infrastructure such as our DNS zones.   We will prioritize the necessary alert and monitor changes and I will report back to the community once that has been completed.

Thank you for your patience in this regard.
/John

John Curran
President and CEO
American Registry for Internet Numbers






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20190111/9399d6f9/attachment.html>


More information about the NANOG mailing list