DNS TTL adherence
Simon Waters
simonw at zynet.net
Thu Mar 16 09:13:20 UTC 2006
On Thursday 16 Mar 2006 04:23, you wrote:
>
> You might consider the following paper from IMC 2003: "On the
> Responsiveness of DNS-based Network Control" by Jeffrey Pang, Aditya
> Akella, Anees Shaikh, Balachander Krishnamurthy, Srinivasan Seshan,
> http://www.imconf.net/imc-2004/papers/p21-pang.pdf
The results are greatly at odds with my experience.
As they imply the problem may be specifically misconfigured ISPs DNS server,
which might explain why we see less violations, if our sites aren't popular
with those ISPs users.
However I wouldn't trust any report where the control of the authoritative DNS
itself wasn't explicitly monitored and reported. They may think they have
updated the authoritative answers (and TTL), but in my experience when you
find violators you often find that the authoritative DNS servers didn't all
update as, or when, expected, or that earlier records were returned with a
longer TTL from those servers.
Certainly that was the experience of moving many sites last week. Where you
can in real time check the logs and find which domains we messed up on by the
traffic still arriving.
Looking at the 4 long term violators for one site....
Hits Source IP
8 198.78.130.68 <--- ??
1 212.95.252.16 <--- lager.netcraft.com
15 66.147.154.3 <--- IBM Almaden Research Center
5 70.42.51.10 <--- Fast Search & Transfer
During this period (starting 3 days after moving a 10 minute TTL) we saw 27234
hits (okay not exactly a busy site) for that site on the correct server. So
roughly 1 in a 1000 hits during days 3 to 6 went to the old web server, and
this domain had the most lost hits, most of the moved domains don't show in
the old server's log at all.
Given I think we can exclude at least 21 out of 29 safely as being
"non-human" (sorry IBM Research if you were deeply interested in proof
reading), and I'm guessing have made a deliberate effort to cache stale data
for their own reasons.
So I can put an upper estimate on our sites of 1 in 1000 hits of interest
going to the wrong site during days 3 to 6.
The most popular site moved, had only two DNS violators days 3 to 6, the most
notable being the same "Fast Search & Transfer" IP above.
It may be that popular sites have a far worse problem by dint of exercising
more caching code, but this site is far from being our most popular. And
these sites were moved by reducing the TTL to a low value (10 minutes) and
keeping it there for a long period of time, before we actually performed the
move.
More information about the NANOG
mailing list