That MIT paper

Thu Aug 12 00:54:29 UTC 2004

On Wed, Aug 11, 2004 at 04:49:18PM +0000, Paul Vixie scribed:
> what i meant by "act globally, think locally" in connection with That
> MIT Paper is that the caching effects seen at mit are at best
> representative of that part of mit's campus for that week, and that

  Totally agreed.  The paper was based upon two traces, one from
MIT LCS, and one from KAIST in Korea.  I think that the authors
understood that they were only looking at two sites, but their
numbers have a very interesting story to tell -- and I think that
they're actually fairly generalizable.  For instance, the rather
poorly-behaving example from your f-root snapshot is rather consistent
with one of the findings in the paper:

  [Regarding root and gTLD server lookups] "...It is likely that
many of these are automatically generated by incorrectly implemented
or configured resolvers;  for example, the most common error 'loopback'
is unlikely to be entered by a user"

> even a variance of 1% in caching effectiveness at MIT that's due to
> generally high or low TTL's (on A, or MX, or any other kind of data)
> becomes a huge factor in f-root's load, since MIT's load is only one

  But remember - the only TTLs that the paper was suggesting could be
reduced were non-nameserver A records.  You could drop those all to zero
and not affect f-root's load one bit.  In fairness, I think this is
jumbled together with NS record caching in the paper, since most
responses from the root/gTLD servers include both NS records and
A records in an additional section.

Global impact is greatest when the resulting load changes are
concentrated in one place.  The most clear example of that is changes
that impact the root servers.  When a 1% increase in total traffic
is instead spread among hundreds of thousands of different, relatively
unloaded DNS servers, the impact on any one DNS server is minimal.
And since we're talking about a protocol that variously occupies less than
3% of all Internet traffic, the packet count / byte count impact is
negligible (unless it's concentrated, as happens at root and
gtld servers).

The other questions you raise, such as:

> how much of the measured traffic was due to bad logic in 
> caching/forwarding servers, or in clients?  how
> will high and low ttl's affect bad logic that's known to be in wide
> deployment? 

are equally important questions to ask, but .. there are only so many
questions that a single paper can answer.  This one provides valuable
insight into client behavior and when and why DNS caching is effective.
There have been other papers in the past (for instance, Danzig's 1992
study) that examined questions closer to those you pose.  The results from
those papers were useful in an entirely different way (namely, that almost
all root server traffic was totally bogus because of client errors).

It's clear that from the perspective of a root name server operator,
the latter questions are probably more important.  But from the
perspective of, say, an Akamai or a Yahoo (or joe-random dot com),
the former insights are equally valuable.

  -Dave

-- 
work: dga at lcs.mit.edu                          me:  dga at pobox.com
      MIT Laboratory for Computer Science           http://www.angio.net/