[SA-exim] Re: SA-Exim timeout
Marc MERLIN
marc at merlins.org
Mon, 21 Oct 2002 11:14:12 -0700
On Mon, Oct 21, 2002 at 10:04:44AM +0100, Tim Jackson wrote:
> Hi Marc, on Sun, 20 Oct 2002 16:03:17 -0700 you wrote:
Hi,
I appreciate your looking into this, especially while I was sleeping :-)
> > When do you get the timemout? Are you trying to stall the sender or is
> > spamd just taking too long to answer?
>
> I'm not trying to stall the sender, no. The only thing I have enabled at
> the moment is permanent rejection over a threshold. I'm not using
Ok.
I was looking into this plus teergrube on my side, which didn't really
help :-)
> tempreject, senderstall, teergrube or anything like that. So it's either
> spamd taking too long to answer or something else. The fact it's just
> happened since I upgraded to SA-E2.1 does suggest it's not actually
Ok, so I did screw something up in the last release while trying to fix
that other report about sa-exim sometimes feeding a truncated message to
spamc. That's very good to know, thanks.
> I'm going to enable SA-E debugging and hope the trouble message gets
> retried shortly and I can give some more clues.
I have a secondary MX where I disabled sa-exim for now so that I can
capture such a message and re-feed it at will.
On Mon, Oct 21, 2002 at 10:17:25AM +0100, Tim Jackson wrote:
> Following on from my last e-mail, I should just mention that yesterday I
> not only switched to SA-E 2.1, but also upgraded SA from 2.42 to 2.43.
> However, I don't think it's caused by that, because I put SA-E 2.1 on
> another machine the other day, which has still got SA2.42 on it, and
> that's having a similar problem. But I thought I should mention it.
Right, but we know it's not that.
On Mon, Oct 21, 2002 at 10:43:33AM +0100, Tim Jackson wrote:
> 10:18:48 ...
> 10:18:48 183Yho-0004Ju-00 SA: check succeeded, running spamc
> 10:23:48 183Yho-0004Ju-00 local_scan() function timed out - message
> temporarily rejected
>
> Notice the lack of a "Setting timeout..." message here. Weird, eh?
Ok, that's _very_ good to know.
> 1. something that is causing SA-E not to set alarm()
It never gets there.
> 2. something that is causing SA to not scan this particular message
> properly (maybe because of its size? I wonder if it may be something
> like the fact that SA-E has cut the message off half way through an
> attachment or something? Is spamc's cutoff handling more intelligent or
> does it simply cut off after 'x' bytes?)
That's what I tried to fix. SA-Exim will send slighlty more than
samaxbody to spamc so that spamc gets the option of not feeding the spam
to spamd, and just returning "not scanned"
I think I really should just see how big the file is and not even feed
it to spamc if it's too big, but in the meantime, the current code
should stll work, and has worked so far.
On Mon, Oct 21, 2002 at 12:20:40PM +0100, Tim Jackson wrote:
> OK, I think the timeout we've been discussing is a bug in SA-E, with SA-E
> getting caught in a loop. Basically, if I'm right, 'ret' is being re-used
> and overwritten when we don't want it to be.
Ok, I don't get this.
> --- sa-exim-2.1/sa-exim.c.orig Mon Oct 14 04:27:57 2002
> +++ sa-exim-2.1/sa-exim.c Mon Oct 21 11:58:52 2002
> @@ -312,6 +312,7 @@
> {
> #warning you shouldn''t worry about the "might be clobbered by longjmp",
> see source
> int ret;
> + int ret_read;
> int pid;
> int writefd[2];
> int readfd[2];
> @@ -618,12 +619,12 @@
> * than SA is going to process, but let's send at least one byte more
> for
> * spamc to do the size cutoff, not us */
> chunk=(samaxbody+1 / sizeof(buffera));
> - while ((ret=read(fd, buffer, sizeof(buffera))) > 0 && chunk-- > 0)
> + while ((ret_read=read(fd, buffer, sizeof(buffera))) > 0 && chunk-- >
> 0)
> {
> ret=write(writefd[1], buffer, ret);
> CHECKERR(ret,"body write",__LINE__);
> }
> - CHECKERR(ret, "read body", __LINE__ - 4);
> + CHECKERR(ret_read, "read body", __LINE__ - 4);
Granted, it's monday morning, and I was hacking late (on something else :-)
last night, but where's the problem with the old code?
If ret is 0, we don't enter while, if it's negative we display the
error. Inside the loop, ret should be >=0 or it is a write error.
Sure, that value is rechecked when you exit the loop, but that doesn't
matter, it's still >=0 or it would have tripped the first CHECKERR.
> @@ -813,6 +814,7 @@
> afterscan=time(NULL);
> scantime=afterscan-beforescan;
>
> + ret=0;
> wait(&ret);
This was for testing, right? (it shouldn't do anything)
On Mon, Oct 21, 2002 at 01:46:56PM +0100, Tim Jackson wrote:
> chunk=(samaxbody+1 / sizeof(buffera));
>
> where chunk is an int. I'm assuming that the intended result here is that
> chunk contains 15. (although actually, I think it should be 16 to avoid
> missing the last block)
Right. I make it a bit bigger on purpose. Too big is better than missing
the end of the body :-)
> But, if I add a bit of debugging logging:
>
> log_write(0, LOG_MAIN, "SA: samaxbody %d, buffera-size %d, chunks %d",
> samaxbody+1,sizeof(buffera),chunk);
>
> I get in my Exim log:
>
> SA: samaxbody 256001, buffera-size 16384, chunks 256000
I don't understand why chunks is 256000 here.
> This doesn't stop it working (since the return value from read() will be
> zero once everything's read) but ought to be fixed. I think the relevant
> calculation should be something like:
>
> chunk=((samaxbody+1) / sizeof(buffera))+1;
I switched it to, which is what I think it should be.
chunk=(samaxbody / sizeof(buffera))+1;
> (Also, why is sizeof(buffera)=16384 and not 4096 as it seems to be when
> declared?)
buffera-size=16384 is a compiler issue, it allocates 4 bytes for a char,
but since I read sizeof(buffera), I don't make any assumption on the
size of buffera.
In other words, it's unexpected, but it's ok.
> Interestingly, when saving chunks to file (around line 215), the divisor
> used is "sizeof(buffera)-1":
Right. I touched the code over too many months to remember that :-)
> chunk=(SAmaxarchivebody / (sizeof(buffera)-1))+1;
> Why? Which is right? (I don't think it's critical, but it would be nice to
> be consistent :)
I've changed both to be like above (without the -1, which was an initial
worry about getting one more byte written for the trailing null (not
needed))
> Incidentally, in the code it's mentioned about talking to spamd directly
> rather than via spamc. Exiscan implements this, so if you want a test
> implementation to copy, it's there :)
I might do that, although I kind of like not having to worry about the
network protocol, possible changes, and so forth. Forking spamc is
easier IMO
Basically I'm thinking that the problem was the missing the +1 at the
end of chunk as you suggested and I wasn't sending the whole body to
spamc, but I'm not quite sure why it would cause the code to stop there
and not continue to
log_write(0, LOG_MAIN, "SA: fed spam to spamc, reading result");
and setting the SA timeout.
Either way, I've added more debug status output to see where it stops
should it happen again
See CVS for the current version I started running 5mn ago :-)
http://sourceforge.net/cvs/?group_id=56124
http://marc.merlins.org/linux/exim/files/sa-exim-cvs/
Thanks for your copious feedback
Cheers,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems & security ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | Finger marc_f@merlins.org for PGP key