Comment spammers chewing blogger bandwidth like crazy

Alexander Harrowell a.harrowell at gmail.com
Tue Jan 16 09:42:32 UTC 2007


> Frisvold: How does this make his assumption incorrect?  Spam is spam and DNSBLs
> will likely be very effective when it comes to stopping comment spam.
> There are, of course, some severe problems with using a DNSBL as a
> blocklist for comments...

   But there's a major problem here...  A DNSBL is a source blocklist.
> Since the current trend in spam (comment and smtp) is to use botnets,
> then by blocking the bots, you also block the users who would make
> meaningful comments.

Especially as bots are usually found in customer dynamic-IP pools.
Assigning a value for relative spamminess by country would work up to
a point (Italy, Ukraine, we mean you) but the false positive rate is
unacceptable. Anyway, very anti-Internet and hardly appropriate for a
blog whose declared mission is pan-European opinion..

> The argument there is that those users don't deserve to comment if
> they can't keep their computers clean, but let's get real..  Some of
> this stuff is getting pretty advanced and it's getting tougher for
> general users to keep their computers clean.
>
> I think a far better system is something along the lines of a SURBL
> with word filtering.  I believe that Akismet does something along
> these lines.

We had a word filter plus lookups of bsb.spamlookup.net. Our
experience in the last few months was not good - the rate of false
positives was high (essentially all genuines had to be individually
approved, and worse, rather than into a queue they usually went into
the spamtrap) and the rate of false negatives was nontrivial.

We have recently implemented Akismet. It's a major improvement - the
false positives have been nearly eliminated and the false negatives
down to a couple a week. Multi-layered defence is a "must" - for
example, most comments spam is very self-similar, so you could run a
Bayesian filter comparing the stuff rejected by the blocklist with the
content of the trap in order to sort between "spam" and "hold for
approval".

Mind you, some of the Bayesian-beating techniques used for SMTP spam
are now showing up in comments - for example, delivering the
beneficiary link and a paragraph of news scraped from news.bbc.co.uk,
which is a lot like a real (but dull:-)) comment. Perhaps a better
filter might be on the links they contain (some domains come up again,
and again, and again).

Then again, once you're doing anything like that, it's already hit
your server and is costing cycles if nothing else. In the future,
someone will lose the vote through being mistaken for a spambot.

Alex



More information about the NANOG mailing list