DNS attacks evolve

Florian Weimer fw at deneb.enyo.de
Sun Aug 10 03:58:09 CDT 2008

* Joe Greco:

> I am very, very, very disheartened to be shown to be wrong.  As if 8 days
> wasn't bad enough, a concentrated attack has been shown to be effective in
> 10 hours.  See http://www.nytimes.com/2008/08/09/technology/09flaw.html

Note that the actual bandwidth utilization on that GE link should be
somewhere between 10% and 20% if you send minimally sized replies during
spoofing.  In fact, the theoretically predicted time for 50% success
probability for 100mbps attacks is below one day.

This also matches the numbers posted here:


> 1) Use of multiple IP addresses for queries (reduce success rate somewhat)

You must implement this carefully.  Just using a load-balanced DNS setup
doesn't work, for instance.  The attacker could trigger the cache misses
through a CNAME he controls, so he'd know which instance to attack in
each round.

> 2) Rate-limiting of query traffic, since I really doubt many sites actually
>    have recursers that need to be able to spike to many times their normal
>    traffic,

The problem with that is that 130,000 queries over a 10 hour period (as
in Evgeniy's experiment) are often lost in the noise.  Only if the
authoritative servers are RTT-wise close to your recursor, the attacker
benefits from high query rates.

> 3) Forwarding of failed queries (which I believe BIND doesn't currently
>    allow) to a "backup" server (which would seem to be interesting in
>    combination with 2)

I don't think any queries fail in this scenario.

> 4) I wonder if it wouldn't make sense to change the advice for large-scale
>    recursers to run multiple instances of BIND, internally distribute the
>    requests (random pf/ipfw load balancing) to present a version of 1) that 
>    would render smaller segments of the user base vulnerable in the event of
>    success.  It would represent more memory, more CPU, and more requests,
>    but a smaller victory for attackers.

User-specific DNS caches are interesting from a privacy perspective,
too.  But I don't think they'll work, except when the cache is in the

> 5) Modify BIND to report mismatch QID's.  Not a log report per hit, but some
>    reasonable strategy.  Make the default installation instructions include
>    a script to scan for these - often - and mail hostmaster.

Yes, better monitoring is crucial.  Recent BIND 9.5 has a counter for
mismatched replies, which should provide at least one indicator.  Due to
the diversity of potential attacks, it's very difficult to set up
generic monitoring.

> 6) Have someone explain to me the reasoning behind allowing the corruption
>    of in-cache data, even if the data would otherwise be in-baliwick.  I'm 
>    not sure I quite get why this has to be.  It would seem to me to be safer
>    to discard the data.  (Does not eliminate the problem, but would seem to
>    me to reduce it)

The idea is that the delegated zone can introduce additional servers not
listed in the delegated zone.  (It's one thing that gets you a bit of
IPv6 traffic.)  Unfortunately, it's likely that performance would suffer
for some sites if resolver 

> 7) Have someone explain to me the repeated claims I've seen that djbdns and
>    Nominum's server are not vulnerable to this, and why that is.

For DJBDNS, see: <http://article.gmane.org/gmane.network.djbdns/13371>

Nominum has published a few bits about their secret sauce:


TCP fallback on detected attack attempts is expected to be sufficiently
effective so that you can get away with a smaller source port pool.
Even if it's not, on some platforms, a smallish pool is the only way to
cope with the existing load until you can bring in more servers, so it's
better than nothing.

The TCP fallback idea was posted to namedroppers in 2006, in response to
one of Bert's early drafts which evolved into the forgery resilience
document, so it should not be encumbered.  The heuristics when to
trigger the attack could be, though.

More information about the NANOG mailing list