[SA-exim] Greylisting algorithms after end of DATA

Magnus Holmgren holmgren at lysator.liu.se
Sat Jan 13 05:37:07 PST 2007


Traditional greylisting combines the remote host, envelope sender, and 
envelope recipient and checks if that triplet has been seen before (not too 
long ago but also at least some time ago) after each RCPT command. (Correct 
me if I'm wrong.) The advantage is that it saves bandwidth.

Running SpamAssassin after end of DATA but before accepting the mail gives the 
advantage that greylisting can be applied only to grey mail - the delaying of 
clearly non-spam mail can be avoided. It also means that e.g. the Message-ID 
can be considered when determining whether we have seen the message before. 
In fact, nothing prevents us from using an arbitrary set of header fields 
(such as Subject, Message-ID, From) in constructing the key, if it gives 
better confidence in what we want to know: whether the other end retries 
after a temporary failure. (We could even accept delivery and whitelist based 
on a partial match, say 3 of 4, to better cope with the braindead mail 
servers that unfortunately exist.) After we have determined that it does, 
there's no reason to greylist further mail. (Well, there might be a reason to 
delay mail from new senders at large ESPs like Hotmail, if that means that 
URIs in the spam get the time to end up in URIBLs. This is open to 
discussion.)

So, what I suggest for a future SA-Exim version (and to anyone implementing 
something similar using only Exim ACLs is this): For each host (or /24 or /64 
network), store a list of records representing messages that host has tried 
to deliver. A record contains a timestamp and a key, which could be a hash of 
$rh_From:, $rh_Subject:, $recipients (but see below) etc. When a message 
matches an existing record, check the timestamp, and if enough time has 
passed, replace the whole list with "whitelisted" (if not, do nothing). (Most 
of the time, just one message arrives before the host gets whitelisted.)

One question to be solved is about $recipients. The envelope recipients have 
to be checked since a spammer can send the same spam to many addresses but 
with the same From: field. Most often there is only one recipient, and even 
otherwise, normally the list is the same from delivery attempt to delivery 
attempt, but it could change if one or more recipients were temporarily 
rejected on one occasion but not the other. Furthermore, it can't be demanded 
that MTAs give the list in the same order each time.

When storing the list of attempted deliveries in a file I'd prefer if the file 
didn't have to be rewritten, only appended to. Maybe it can be deemed enough 
if one recipient is found in the list of recipients of the first delivery 
attempt.

Comments please!

-- 
Magnus Holmgren        holmgren at lysator.liu.se
                       (No Cc of list mail needed, thanks)

  "Exim is better at being younger, whereas sendmail is better for 
   Scrabble (50 point bonus for clearing your rack)" -- Dave Evans
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.merlins.org/archives/sa-exim/attachments/20070113/3ea3904c/attachment.pgp 


More information about the SA-Exim mailing list