Advanced Anti-Spam Techniques for MacOS X

This document is based on the fetchmail and procmail configuration described in Free Spam Filtering for MacOS X: How To Do It. If you haven't read that document, go back and read it now. These instructions are based on that specific MacOS X configuration, but should be useful for any general Unix procmail environment.


Killing Persistent Spammers with Procmail
Most of your spam will come from a few dedicated spammers, and you can cut the majority of your incoming spam by filtering those spammers directly. Sometimes you get one particular spammer that continually sends you junk mail, anything sent from that address or domain will always be spam. With procmail, it is easy to send these messages straight to /dev/null, the Unix equivalent of the trash can. You can insert a simple, short procmail script into your ~/procmailrc file that will kill specific addresses, right before the SpamBouncer scripts are called. The mails are rejected before SpamBouncer sees them, so it can save processing time in Spambouncer. You will have the ultimate level of control over your incoming mailbox.
A perfect example happened this week. Some idiot spammer sent me 2MB spam with a Powerpoint file attached. Procmail took a lot of processing time to run the SpamBouncer script, while my computer (a relatively slow G3/400) ground to a halt. And then they sent it again. And again. I put them in the filters, and blocked 5 more incoming 2MB spams. Now you can see why I recommend you check your procmail logs, to keep an eye out for this sort of stupid spam stunt. It was easy to look at these emails and see they were all sent from the same address, some spammers make it easy for you to filter them. Here's a sample script to filter email with address in the From: field.

VIRUSFOLDER=/Users/username/procmail/spam
:0
* ^From.*(spammer@spam.com)
/dev/null
INCLUDERC=/Users/username/procmail/spambouncer/sb.rc

This example script is shown in context , to show how it is inserted in ~/.procmailrc right before the last line INCLUDERC that calls the SpamBouncer filter. Remember your ~/.procmailrc will have your MacOS X username in place of username. We'll omit the context from our next examples, and just focus on the three lines that do all the work.
The script looks at the From field of incoming emails, and if it finds a match, the mail is immediately sent to /dev/null where it is erased instantly. You can also put domain names into the parentheses, even a long list of names. Use the concatenation symbol between the addresses. You can put an almost unlimited number of addresses on the same line.

:0
* ^From.*(spammer@spam.com|junkmail.com|spambag.com)
/dev/null

You can see this script blocks the whole domains junkmail.com and spambag.com. I want you to stop and think about that for a second. Every single email from anywhere inside those domains will be deleted instantly. If you put in a name that is very broad, like hotmail.com or yahoo.com, you will never receive any mail from those domains. So you better be darn sure you want to do this. I try not to block huge domains, SpamBouncer tends to catch spams from free emailers. But for small domains that do nothing but spam you persistently, put them in the filters.
Some spammers are more clever, they use faked From: addresses, or use dozens of From addresses but send from the same domain. These take a little more effort to block. You will have to learn to read email headers. Fortunately, SpamBouncer tags each rejected email with the characteristics of that spam, sometimes it will tell you where to look. Here's a good example email header from my own spam dump, I removed my real email address, to make it a little harder for spammers to harvest my address from this page.

From ceicher  Sun Jul 14 16:55:58 2002
Return-Path: <perf-errors.3565.65683.5914160.501.0.4@boing.topica.com>
Delivered-To: [removed]
Received: from soli.inav.net [64.6.64.4]
	by localhost with POP3 (fetchmail-5.9.0)
	for ceicher@localhost (single-drop); Sun, 14 Jul 2002 16:55:58 -0500 (CDT)
Received: (qmail 750 invoked by uid 0); 14 Jul 2002 16:50:48 -0500
Received: from out012.tfmb.net (HELO outmta020.topica.com) (66.180.247.32)
  by soli.inav.net with SMTP; 14 Jul 2002 16:50:48 -0500
To: [removed]
From: ContentWatch <emailrewardz@emailrewardz.email-publisher.com>
Subject: Advisory: Hidden file danger
Date: Sun, 14 Jul 2002 14:50:47 -0700
Message-ID: <65683.3565.1769412112-1463747838-1026683447@topica.com>
Errors-To: <perf-errors.3565.65683.5914160.501.0.4@boing.topica.com>
Reply-To: perf-remove.3565.65683.5914160.0.0.4@boing.topica.com
X-Topica-Id: <1026681038.svc001.8316.1000119>
Mime-Version: 1.0
Status:  U
X-UIDL: 1026683448.753.soli.inav.net
Content-Type: multipart/alternative;
    boundary="TEP-1545058628.1463793150.1026680402"
X-SpamBouncer: 1.5 (6/13/02)
X-SBRule: Pattern Match (Disclaimer) (Score: 9656)
X-SBRule: Pattern Match (Web Hosting) (Score: 800)
X-SBRule: Pattern Match (Haven Domain) (Score: 0)
X-SBRule: topica.com mailing list
X-SBClass: Blocked
You can see that this mail appears to come from emailrewardz@emailrewardz.email-publisher.com, but it really doesn't. That name alone should alert you that this is a persistent, devious spammer. But notice that the Received: line includes the text (HELO outmta020.topica.com). This is the true source of the spam. The X-Spambouncer headers confirm that this email was sent from Topica.com, a "haven domain" that exists solely to spam. There are other ways to identify the true sender by reading the headers, you might want to read this FAQ or this HowTo on this topic.
If you get one spam from this spammer's domain, you can be guaranteed that more will follow. So let's put the whole domain topica.com in our filter. But the domain name does not appear in the From: field, it could be almost anywhere in the headers, since spammers frequently insert faked or broken headers to disguise the origin. We will have to create a more general filter, to search for a string anywhere in the message. Here is an example.

:0
* (topica.com)
/dev/null
Note that unlike the previous scripts that search only the From: field, this script searches the entire document. This is a little bit risky, since you could get a legitimate email from someone that writes "I sure get a lot of spam from topica.com" and this script would delete the message. So be careful. It would be useful to read some of the fine procmail tutorials to learn how to refine these scripts.
Let's put in one more example of a useful script. I get a ton of foreign language spam in Korean and Chinese. SpamBouncer blocks these messages pretty well, but I get hundreds of them, and I want them to go straight to /dev/null instead of my block folder. Most of these messages are easily identified, they have a Content-type: field in the headers that identifies the language. But some of these messages do not use the proper headers, so we will produce another generic script that will search the entire message for all the Chinese and Korean encodings in common use today.
:0
* (big5|gb2312|euc-kr|ks_c_5601-1987)
/dev/null

Now that we have a couple of different scripts that serve different purposes, we can put several of these short 3-line procmail scripts together in a row. As long as you insert them just before the final line where sb.rc is called, the scripts will all run in sequence before SpamBouncer, each script can discard spam without further processing.
There is one problem with this strategy: bloat. I've been using this technique for only a few months and already I have over 100 domains blocked. Over time, this file will grow larger and larger, and take longer and longer to process. Also, spammers tend to throw away old domain names and use new ones, so you never know how many entries in your filter are obsolete and useless. So don't just throw every spammer's address in the script, just the ones that send you the most mail.
The good news is that this strategy is pretty effective. I used to get about 50 to 60 spams in my block folder each day. After adding new names to my filters for a few weeks, spam is down to about 1 every few days. In fact, my filters work so well, I had to turn them off to collect even a single sample spam so I could write this documentation! I hope this technique works as well for you as it does for me. This is a classic case study in adapting Open Source software to the Mac environment. But I suspect this is the end of the road for this type of spam filtration, Apple has announced that their next MacOS X release will upgrade Mail.app to include spam filters. These procmail techniques may be obsolete in a matter of weeks. Or they may continue to be valuable. Perhaps Apple is even using SpamBouncer. We will know soon.