DNS caches that support partitioning ?

William Herrin bill at herrin.us
Mon Aug 20 02:11:04 UTC 2012


On Sun, Aug 19, 2012 at 5:37 PM, Mark Andrews <marka at isc.org> wrote:
> As for the original problem.  LRU replacement will keep "hot" items in
> the cache unless it is seriously undersized.

Maybe. This discussion is reminiscent of the Linux swappiness debate.

Early in the 2.x series Linux kernels, the guy responsible for the
virtual memory manager changed it to allow the disk cache to push
program code and data out of ram if all other disk cache was more
recently touched than the program data. Previously, the disk cache
would only consume otherwise free memory. Programs would only get
pushed to swap by memory demands from other programs.

The users went ape. Suddenly if you copied a bunch of data from one
disk to another, your machine would be sluggish and choppy for minutes
or hours afterward as programs recovered swapped pages from disk and
ran just long enough to hit the next section needing to be recovered
from swap. Some folks ditched swap entirely to get around the problem.

The guy insisted the users were wrong. He had the benchmarks,
meticulously collected data and careful analysis to prove that the
machines were more efficient with pure LRU swap. The math said he was
right. 2+2=4. But it didn't.

In the very common case of copy-a-bunch-of-files, simple LRU
expiration of memory pages was the wrong answer. It caused the machine
to behave badly. More work was required until a tuned and weighted LRU
algorithm solved the problem.


Whether John's solution of limiting the cache by zone subtree is
useful or not, he started an interesting discussion. Consider, for
example, what happens when you ask for www.google.com. You get a 7-day
CNAME record for a 5 minute www.l.google.com A record and the resolver
gets 2-day NS records for ns1.google.com, 4 day A records for
ns1.google.com, 2 day NS records for a.gtld-servers.com, etc.

Those authority records don't get touched again until www.l.google.com
expires. With a hypothetically simple least recently used (LRU)
algorithm, the 4 minute old A record for ns1.google.com was last
touched longer ago than the 3 minute old A record for
5.6.7.8.rbl.antispam.com. So when the resolver needs more cache for
4.3.2.1.rbl.antispam.com, which record gets kicked?

Then, of course, when www.l.google.com expires after five minutes the
entire chain has to be refetched because ns1.google.com was already
LRU'd out of the cache. This is distinctly slower than just refetching
www.l.google.com from the already known address of ns1.google.com and
the user sees a slight pause at their web browser while it happens.

Would a smarter, weighted LRU algorithm work better here? Something
where rarely used leaf data doesn't tend to expire also rarely used
but much more important data from the lookup chain?

Regards,
Bill Herrin


-- 
William D. Herrin ................ herrin at dirtside.com  bill at herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004




More information about the NANOG mailing list