dns and software, was Re: Reliable Cloud host ?

Joe Greco jgreco at ns.sol.net
Wed Feb 29 21:02:47 UTC 2012

> On Wed, Feb 29, 2012 at 7:57 AM, Joe Greco <jgreco at ns.sol.net> wrote:
> >> In message <CAP-guGXK3WQGPLpmnVsnM0xnnU8==4zONK=UWTLkYWuduA6T9Q at mail.gmail.com>,
> >>  William Herrin writes:
> >> > On Tue, Feb 28, 2012 at 4:06 PM, Mark Andrews <marka at isc.org> wrote:
> >> > > DNS TTL works. =A0Applications that don't honour it arn't a indication th=
> >> > at
> >> > > it doesn't work.
> >> >
> >> > Mark,
> >> >
> >> > If three people died and the building burned down then the sprinkler
> >> > system didn't work. It may have sprayed water, but it didn't *work*.
> >>
> >> Not enough evidence to say if it worked or not.  Sprinkler systems
> >> are designed to handle particular classes of fire, not every fire.
> >
> > It is also worth noting that many fire systems are not intended to
> > put out the fire, but to provide warning and then provide an extended
> > window for people to exit the affected building through use of sprinklers
> > and other measures to slow the spread of the fire.
> Hi Joe,
> The sprinkler system is designed to delay the fire long enough for
> everyone to safely escape.

Hi Bill,

No, the sprinkler system is *intended* to delay the fire long enough
for everyone to safely escape, however, in order to accomplish this,
the designer chooses from some reasonable options to meet certain
goals that are commonly accepted to allow that.  For example, the
suppression design applied to a multistory dwelling where people
live, cook, and sleep is typically different from the single-story 
light office space.  Neither design will be effective against all
possible types of fire

> As a secondary objective, it reduces the
> fire damage that occurs while waiting for firefighters to arrive and
> extinguish the fire. If "three people died" then the system failed.

That's silly.  The system fails if the system *fails* or doesn't
behave as designed.  No system is capable of guaranteeing survival.

Just yesterday, here in Milwaukee, we had a child killed at a
railroad crossing.  The crossing was well-marked, with signals
and gates.  Visibility of approaching trains for close to a mile
in either direction.  The crew on the train saw him crossing,
blew their horn, laid on the emergency brakes.  CP Rail inspected
the gates and signals for any possible faults, but eyewitness
accounts were that the gates and signals were working, and the 
train made every effort to make itself known. 

The 11 year old kid had his hood up and earbuds in, and apparently
didn't see the signals or look up and down the track before crossing,
and for whatever reason, didn't hear the train horn blaring at him.

At a certain point, you just can't protect against every possible
bad thing that can happen.  I have a hard time seeing this as a
failure of the railroad's fully functional railroad crossing and
related safety mechanisms.  The system doesn't guarantee survival.

> Whoever you want to blame, DNS TTL dysfunction at the application
> level is the same way. It's a failed system. With the TTL on an A
> record set to 60 seconds, you can't change the address attached to the
> A record and expect that 60 seconds later no one will continue to
> connect to the old address. Nor 600 seconds later nor 6000 seconds
> later. The "system" for renumbering a service of which the TTL setting
> is a part consistently fails to reliably function in that manner.

It's a failure because people don't understand the intent of the system,
and it is pretty safe to argue that it is a multifaceted failure, due 
to failures by client implementations, server implementations, sample
code, attempts to use the system for things it wasn't meant for, etc. 
This is by no means limited to TTL; we've screwed up multiple addresses,
IPv6 handling, negative caching, um, do I need to go on...?

In the specific case of TTL, the problem is made much worse due to the
way most client code has hidden this data from developers, so that many
developers don't even have any idea that such a thing exists.

I'm not sure how to see that a design failure of the TTL mechanism.

I don't see developers ignoring DNS and hardcoding IP addresses into
code as a failure of the DNS system.

I see both as naive implementation errors.  The difference with TTL is
that the implementation errors are so widespread as to render any sane
implementation relatively useless.

... JG
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.

More information about the NANOG mailing list