NSI bulletin 097-004 | Root Server Problems

Greg A. Woods woods at most.weird.com
Mon Jul 21 04:47:06 UTC 1997


> Date: Thu, 17 Jul 1997 22:52:18 +0500 (GMT)
> From: David Holtzman <dholtz at internic.net>
> To: nanog at merit.edu
> Subject: NSI bulletin 097-004 | Root Server Problems
> Resent-Date: Thu, 17 Jul 1997 14:42:42 -0400 (EDT)
> 
> On Wednesday night, July 16, during the computer-generation of the
> Internet top-level domain zone files, an Ingres database failure resulted 
> in corrupt .COM and .NET zone files.  Despite alarms raised by Network 
> Solutions' quality assurance schemes, at approximately 2:30 a.m. (Eastern 
> Time), a system administrator released the zone file without regenerating the
> file and verifying its integrity.  Network Solutions corrected the
> problem and reissued the zone file by 6:30 a.m. (Eastern Time).  
> 
> Thank you.
> David H. Holtzman
> Sr VP Engineering, Network Solutions
> dholtz at internic.net

So, if the new zone files were re-issued at 06:30 EST, and they take
about an hour to download, why was it that some root servers were still
handing out bad data many hours later (at least one until about 14:00
EST)?  The particular server I'm thinking of, though not residing in the
Eastern timezone, does seem to have what I think is a 24x7 NOC nearby,
and in theory could have been prepared to reload as quickly as anyone.

This may be just a coincidence, but it was about an hour after I
e-mailed and telephoned them that they finally had the right data in
place.  Unfortunately finding the right contact was not entirely trivial
because the listed contact person had a full voice-mailbox and his
operator had no idea who else I could speak to, and the NOC has only a
1-800 number (and a FAX) listed that doesn't work outside the USA.  The
NOC person I finally reached on the telephone didn't even seem to be
fully aware that they indeed ran a root nameserver for the Internet.  He
did know that there was e-mail bouncing, and indeed I didn't expect they
could answer my e-mail if they were using their own root server....

Worst of all though they left the errant server on-line, handing out
NXDOMAIN replies to any and all who asked, while they were downloading
the corrected zone files.  Hopefully this is not standard operating
procedure for a root server, or at least not from now on.

What annoys me most is that I didn't receive any notification of any
sort of problem from any of the mailing lists out of internic.net.  I
probably should subscribe to nanog, but I'd have thought namedroppers,
or maybe even rs-info, should have had the above announcement posted
just as soon as the mailers had enough trustworthy DNS data to deliver
it with.  There was nothing in http://rs.internic.net/announcements/
either, except for drivel about "maintaining high customer service
levels," and there still isn't (though I suppose this event wasn't
exactly "good PR").

What are the current procedures for announcing such problems to more
than just the root operators themselves?

-- 
							Greg A. Woods

+1 416 443-1734      VE3TCP      <gwoods at acm.org>      <robohack!woods>
Planix, Inc. <woods at planix.com>; Secrets of the Weird <woods at weird.com>



More information about the NANOG mailing list