[SA-exim] Greylisting algorithms after end of DATA

Magnus Holmgren holmgren at lysator.liu.se
Sat Jan 13 05:37:07 PST 2007

Previous message: [SA-exim] Spam being let through.
Next message: [SA-exim] [exim] Greylisting algorithms after end of DATA
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Traditional greylisting combines the remote host, envelope sender, and
envelope recipient and checks if that triplet has been seen before (not too
long ago but also at least some time ago) after each RCPT command. (Correct
me if I'm wrong.) The advantage is that it saves bandwidth.

Running SpamAssassin after end of DATA but before accepting the mail gives the
advantage that greylisting can be applied only to grey mail - the delaying of
clearly non-spam mail can be avoided. It also means that e.g. the Message-ID
can be considered when determining whether we have seen the message before.
In fact, nothing prevents us from using an arbitrary set of header fields
(such as Subject, Message-ID, From) in constructing the key, if it gives
better confidence in what we want to know: whether the other end retries
after a temporary failure. (We could even accept delivery and whitelist based
on a partial match, say 3 of 4, to better cope with the braindead mail
servers that unfortunately exist.) After we have determined that it does,
there's no reason to greylist further mail. (Well, there might be a reason to
delay mail from new senders at large ESPs like Hotmail, if that means that
URIs in the spam get the time to end up in URIBLs. This is open to
discussion.)

So, what I suggest for a future SA-Exim version (and to anyone implementing
something similar using only Exim ACLs is this): For each host (or /24 or /64
network), store a list of records representing messages that host has tried
to deliver. A record contains a timestamp and a key, which could be a hash of
$rh_From:, $rh_Subject:, $recipients (but see below) etc. When a message
matches an existing record, check the timestamp, and if enough time has
passed, replace the whole list with "whitelisted" (if not, do nothing). (Most
of the time, just one message arrives before the host gets whitelisted.)

One question to be solved is about $recipients. The envelope recipients have
to be checked since a spammer can send the same spam to many addresses but
with the same From: field. Most often there is only one recipient, and even
otherwise, normally the list is the same from delivery attempt to delivery
attempt, but it could change if one or more recipients were temporarily
rejected on one occasion but not the other. Furthermore, it can't be demanded
that MTAs give the list in the same order each time.

When storing the list of attempted deliveries in a file I'd prefer if the file
didn't have to be rewritten, only appended to. Maybe it can be deemed enough
if one recipient is found in the list of recipients of the first delivery
attempt.

Comments please!

--
Magnus Holmgren holmgren at lysator.liu.se
(No Cc of list mail needed, thanks)

"Exim is better at being younger, whereas sendmail is better for
Scrabble (50 point bonus for clearing your rack)" -- Dave Evans
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.merlins.org/archives/sa-exim/attachments/20070113/3ea3904c/attachment.pgp

Previous message: [SA-exim] Spam being let through.
Next message: [SA-exim] [exim] Greylisting algorithms after end of DATA
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the SA-Exim mailing list