OT: question re. the Volume of unwanted email (fwd)

JC Dill nanog at vo.cnchost.com
Wed Jun 18 22:44:14 UTC 2003


Jack Bates wrote:

> Petri Helenius wrote:
>>
>> Isn´t "highlight and hit delete" exactly what has been implemented since
>> Mozilla 1.3 and works with almost perfect accuracy after you give it a 
>> few
>> dozen messages to build up the "good and bad" database with?
>>
> Actually, I find that 1.3 and 1.4 still have issues with determining 
> spam. While fairly decent, one still has to go through looking for false 
> positives. The other issue is that spammers have been doing a good job 
> at designing emails to fool filters. I'm starting to see more and more 
> spam designed to defeat Baynesian filters. By including "good" words in 
> their emails, they either make good words spammy so that you get more 
> FP's or they make their email clean enough that it's still in your 
> inbox. The worst part of it is that spam is quickly becoming unreadable, 
> so that legitimate emails that are readable are the emails more likely 
> filtered.

I have not found this to be the case.  While I don't manage an abuse
mailbox, I do manage a busy mailing list.  The mailing list address and
administrative addresses have been picked up by spammers and are
probably now on all those "millions of email addresses" CDs.  The
mailing list address and administrative addresses are also both
regularly forged (used to send spam) so I get all the undeliverable
spams mixed in with all the undeliverable actual list email.

Until I started using the Bayesian filters in Mozilla, weeding thru the
spam to find the actual administrative emails that needed my attention
was a very big chore, and my false positive rate utilizing JHD was
fairly high.  Now Mozilla filters for me, and has a much lower false
positive rate.

Note, I fed Mozilla's Bayesian filters two folders, each containing over
1000 emails, one full of spam and one full of legitimate administrative
email, to train it to learn what was and wasn't acceptable email.  Hand
sorting until I had these two seed folders took a fair bit of time, but
it was clearly worth it!

The Bayesian filters are the main reason I'm using Mozilla.  Eudora does
some things much better than Mozilla, but I can't live without the spam
filters anymore!

jc








More information about the NANOG mailing list