[SA-exim] Re: SA-Exim timeout

Marc MERLIN marc at merlins.org
Mon, 21 Oct 2002 11:14:12 -0700


On Mon, Oct 21, 2002 at 10:04:44AM +0100, Tim Jackson wrote:
> Hi Marc, on Sun, 20 Oct 2002 16:03:17 -0700 you wrote:
 
Hi,
 
I appreciate your looking into this, especially while I was sleeping :-)
 
> > When do you get the timemout? Are you trying to stall the sender or is
> > spamd just taking too long to answer?
> 
> I'm not trying to stall the sender, no. The only thing I have enabled at
> the moment is permanent rejection over a threshold. I'm not using

Ok.
I was looking  into this plus teergrube on my  side, which didn't really
help :-)

> tempreject, senderstall, teergrube or anything like that. So it's either
> spamd taking too long to answer or something else.  The fact it's just
> happened since I upgraded to SA-E2.1 does suggest it's not actually

Ok, so I did screw something up  in the last release while trying to fix
that other report about sa-exim sometimes feeding a truncated message to
spamc. That's very good to know, thanks.

> I'm going to enable SA-E debugging and hope the trouble message gets
> retried shortly and I can give some more clues.

I have  a secondary MX where  I disabled sa-exim  for now so that  I can
capture such a message and re-feed it at will.

On Mon, Oct 21, 2002 at 10:17:25AM +0100, Tim Jackson wrote:
> Following on from my last e-mail, I should just mention that yesterday I
> not only switched to SA-E 2.1, but also upgraded SA from 2.42 to 2.43.
> However, I don't think it's caused by that, because I put SA-E 2.1 on
> another machine the other day, which has still got SA2.42 on it, and
> that's having a similar problem. But I thought I should mention it.
 
Right, but we know it's not that.

On Mon, Oct 21, 2002 at 10:43:33AM +0100, Tim Jackson wrote:
> 10:18:48 ...
> 10:18:48 183Yho-0004Ju-00 SA: check succeeded, running spamc
> 10:23:48 183Yho-0004Ju-00 local_scan() function timed out - message
>          temporarily rejected
> 
> Notice the lack of a "Setting timeout..." message here. Weird, eh?

Ok, that's _very_ good to know.

> 1. something that is causing SA-E not to set alarm() 
 
It never gets there.

> 2. something that is causing SA to not scan this particular message    
> properly (maybe because of its size? I wonder if it may be something    
> like the fact that SA-E has cut the message off half way through an 
> attachment or something? Is spamc's cutoff handling more intelligent or
> does it simply cut off after 'x' bytes?)

That's  what  I tried  to  fix. SA-Exim  will  send slighlty  more  than
samaxbody to spamc so that spamc gets the option of not feeding the spam
to spamd, and just returning "not scanned"
I think I really  should just see how big the file is  and not even feed
it to  spamc if  it's too  big, but  in the  meantime, the  current code
should stll work, and has worked so far.

On Mon, Oct 21, 2002 at 12:20:40PM +0100, Tim Jackson wrote:
> OK, I think the timeout we've been discussing is a bug in SA-E, with SA-E
> getting caught in a loop. Basically, if I'm right, 'ret' is being re-used
> and overwritten when we don't want it to be.
 
Ok, I don't get this.

> --- sa-exim-2.1/sa-exim.c.orig	Mon Oct 14 04:27:57 2002
> +++ sa-exim-2.1/sa-exim.c	Mon Oct 21 11:58:52 2002
> @@ -312,6 +312,7 @@
>  {
>  #warning you shouldn''t worry about the "might be clobbered by longjmp",
> see source
>      int ret;
> +    int ret_read;
>      int pid;
>      int writefd[2];
>      int readfd[2];
> @@ -618,12 +619,12 @@
>       * than SA is going to process, but let's send at least one byte more
> for
>       * spamc to do the size cutoff, not us */
>      chunk=(samaxbody+1 / sizeof(buffera));
> -    while ((ret=read(fd, buffer, sizeof(buffera))) > 0 && chunk-- > 0)
> +    while ((ret_read=read(fd, buffer, sizeof(buffera))) > 0 && chunk-- >
> 0)
>      {
>  	ret=write(writefd[1], buffer, ret);
>  	CHECKERR(ret,"body write",__LINE__);
>      }
> -    CHECKERR(ret, "read body", __LINE__ - 4);
> +    CHECKERR(ret_read, "read body", __LINE__ - 4);

Granted, it's monday morning, and I was hacking late (on something else :-)
last night, but where's the problem with the old code?
If ret  is 0,  we don't  enter while,  if it's  negative we  display the
error.  Inside the loop, ret should be >=0 or it is a write error.
Sure, that value  is rechecked when you exit the  loop, but that doesn't
matter, it's still >=0 or it would have tripped the first CHECKERR.

> @@ -813,6 +814,7 @@
>      afterscan=time(NULL);
>      scantime=afterscan-beforescan;
>  
> +    ret=0;
>      wait(&ret);

This was for testing, right? (it shouldn't do anything)

On Mon, Oct 21, 2002 at 01:46:56PM +0100, Tim Jackson wrote:
> chunk=(samaxbody+1 / sizeof(buffera));
> 
> where chunk is an int. I'm assuming that the intended result here is that
> chunk contains 15. (although actually, I think it should be 16 to avoid
> missing the last block)
 
Right. I make it a bit bigger on purpose. Too big is better than missing
the end of the body :-)

> But, if I add a bit of debugging logging:
> 
> log_write(0, LOG_MAIN, "SA: samaxbody %d, buffera-size %d, chunks %d",
> samaxbody+1,sizeof(buffera),chunk);
> 
> I get in my Exim log:
> 
> SA: samaxbody 256001, buffera-size 16384, chunks 256000
 
I don't understand why chunks is 256000 here.

> This doesn't stop it working (since the return value from read() will be
> zero once everything's read) but ought to be fixed. I think the relevant
> calculation should be something like:
> 
> chunk=((samaxbody+1) / sizeof(buffera))+1;

I switched it to, which is what I think it should be.
chunk=(samaxbody / sizeof(buffera))+1;
 
> (Also, why is sizeof(buffera)=16384 and not 4096 as it seems to be when
> declared?)
 
buffera-size=16384 is a compiler issue, it allocates 4 bytes for a char,
but since  I read sizeof(buffera),  I don't  make any assumption  on the
size of buffera.
In other words, it's unexpected, but it's ok.

> Interestingly, when saving chunks to file (around line 215), the divisor
> used is "sizeof(buffera)-1":
 
Right. I touched the code over too many months to remember that :-)

> chunk=(SAmaxarchivebody / (sizeof(buffera)-1))+1;
> Why? Which is right? (I don't think it's critical, but it would be nice to
> be consistent :)
 
I've changed both to be like above (without the -1, which was an initial
worry about  getting one more  byte written  for the trailing  null (not
needed))
 
> Incidentally, in the code it's mentioned about talking to spamd directly
> rather than via spamc. Exiscan implements this, so if you want a test
> implementation to copy, it's there :)

I might do that,  although I kind of like not having  to worry about the
network  protocol,  possible changes,  and  so  forth. Forking spamc  is
easier IMO

Basically I'm  thinking that the problem  was the missing the  +1 at the
end of  chunk as you  suggested and I wasn't  sending the whole  body to
spamc, but I'm not quite sure why  it would cause the code to stop there
and not continue to
log_write(0, LOG_MAIN, "SA: fed spam to spamc, reading result");
and setting the SA timeout.

Either way,  I've added more debug  status output to see  where it stops
should it happen again

See CVS for the current version I started running 5mn ago :-)
http://sourceforge.net/cvs/?group_id=56124
http://marc.merlins.org/linux/exim/files/sa-exim-cvs/

Thanks for your copious feedback

Cheers,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems & security ....
                                      .... what McDonalds is to gourmet cooking 
Home page: http://marc.merlins.org/   |   Finger marc_f@merlins.org for PGP key