amavisd-new doesn't seem to get autolearn= portion of the X-Spam-Status header from spamassassin

Asked by Jon Skanes

I've been trying to get Amavis to report if the mail it's checking has been auto learned by SpamAssassin, however the Amavis X-Spam-Status header seems to make no reference to it. This result is rather important to me as I would like to script sa-learn so users can use dump folders on IMAP for mail which hasn't been, or has been incorrectly learned by SA.

I'm running these versions on a fresh install of Hardy:

amavisd-new:
  Installed: 1:2.5.3-1ubuntu3
  Candidate: 1:2.5.3-1ubuntu3
  Version table:
 *** 1:2.5.3-1ubuntu3 0
        500 http://gulus.usherbrooke.ca hardy/main Packages
        100 /var/lib/dpkg/status
spamassassin:
  Installed: 3.2.4-1ubuntu1
  Candidate: 3.2.4-1ubuntu1
  Version table:
 *** 3.2.4-1ubuntu1 0
        500 http://gulus.usherbrooke.ca hardy/universe Packages
        100 /var/lib/dpkg/status

Here is what I'm getting:

X-Spam-Status: No, score=-1.927 tagged_above=-9999 required=6.31
 tests=[AWL=0.673, BAYES_00=-2.599, SPF_PASS=-0.001]

Here is what I want (from the old mail server):

X-Spam-Status: not spam, SpamAssassin (score=-0.803, required 6,
        autolearn=not spam, AWL -0.94, FORGED_RCVD_HELO 0.14)
        ^^^^^^^^^^^^

According to all the docs I've read on Amavis/SA, Amavis doesn't alter the return values of the test results from SA.

From running Amavis in SA debug mode, I get this:

[28443] dbg: learn: auto-learn: currently using scoreset 3, recomputing score based on scoreset 1
[28443] dbg: learn: auto-learn: message score: -3.25358685446009, computed score for autolearn: 0
[28443] dbg: learn: auto-learn? ham=1, spam=6, body-points=0, head-points=0, learned-points=-2.599
[28443] dbg: learn: auto-learn? yes, ham (0 < 1)
[28443] dbg: learn: initializing learner
[28443] dbg: learn: learning ham
[28443] dbg: bayes: tie-ing to DB file R/W /var/lib/amavis/.spamassassin/bayes_toks
[28443] dbg: bayes: tie-ing to DB file R/W /var/lib/amavis/.spamassassin/bayes_seen
[28443] dbg: bayes: found bayes db version 3
[28443] dbg: bayes: learned '9e0024004b5cce00aa9767ad57fcb634239234e9@sa_generated', atime: 1215809602
[28443] dbg: bayes: untie-ing
[28443] dbg: bayes: files locked, now unlocking lock
[28443] dbg: learn: initializing learner
[28443] dbg: check: is spam? score=-3.254 required=5
[28443] dbg: check: tests=AWL,BAYES_00,SPF_HELO_PASS,SPF_PASS
[28443] dbg: check: subtests=__CT,__CTYPE_HAS_BOUNDARY,__DOS_HAS_ANY_URI,__DOS_RCVD_FRI,__DOS_RELAYED_EXT,__ENV_AND_HDR_FROM_MATCH,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__KAM_LOTTO3,__LAST_UNTRUSTED_RELAY_NO_AUTH,__MIME_ATTACHMENT,__MIME_BASE64,__MIME_VERSION,__MISSING_REF,__MSGID_OK_DIGITS,__MSGID_OK_HOST,__MSOE_MID_WRONG_CASE,__NAKED_TO,__NONEMPTY_BODY,__PART_STOCK_CD_F,__SANE_MSGID,__TOCC_EXISTS,__TVD_BODY,__TVD_MIME_ATT,__TVD_MIME_ATT_AP,__TVD_MIME_ATT_TP,__TVD_MIME_CT_MM

So SA seems to be doing the autolearn as expected.

Here are my config files:

Amavis 50-user:
use strict;

#
# Place your configuration directives here. They will override those in
# earlier files.
#
# See /usr/share/doc/amavisd-new/ for documentation and examples of
# the directives you can use in this file
#
$unfreeze = ['unfreeze', 'freeze -d', 'melt', 'fcat']; #disabled (non-free, no security support)
$unrar = ['rar', 'unrar']; #disabled (non-free, no security support)
$lha = 'lha'; #disabled (non-free, no security support)
$myhostname = "windy.skanes.ca";
$final_spam_destiny = D_DISCARD;
$X_HEADER_LINE = "Ubuntu $myproduct_name at $mydomain";
$sa_tag_level_deflt = -9999;
$sa_kill_level_deflt = 15; # triggers spam evasive actions
$sa_dsn_cutoff_level = 15; # spam level beyond which a DSN is not sent
$sa_auto_whitelist = 1;
$sa_spam_report_header = 1;
#
# Debugging settings
#
$sa_debug = '1,bayes,learn';
$log_level = 5;
$LOGFILE = "$MYHOME/amavis.log";
$DEBUG=1;
#
#------------ Do not modify anything below this line -------------
1; # ensure a defined return

SA local.cf:

# This is the right place to customize your installation of SpamAssassin.
#
# See 'perldoc Mail::SpamAssassin::Conf' for details of what can be
# tweaked.
#
# Only a small subset of options are listed below
#
###########################################################################

# Add *****SPAM***** to the Subject header of spam e-mails
#
# rewrite_header Subject *****SPAM*****

# Save spam messages as a message/rfc822 MIME attachment instead of
# modifying the original message (0: off, 2: use text/plain instead)
#
# report_safe 1

# Set which networks or hosts are considered 'trusted' by your mail
# server (i.e. not spammers)
#
# trusted_networks 212.17.35.

# Set file-locking method (flock is not safe over NFS, but is faster)
#
# lock_method flock

# Set the threshold at which a message is considered spam (default: 5.0)
#
# required_score 5.0

# Use Bayesian classifier (default: 1)
#
use_bayes 1

# Bayesian classifier auto-learning (default: 1)
#
bayes_auto_learn 1

# Set headers which may provide inappropriate cues to the Bayesian
# classifier
#
# bayes_ignore_header X-Bogosity
# bayes_ignore_header X-Spam-Flag
# bayes_ignore_header X-Spam-Status
bayes_auto_learn_threshold_nonspam 1
bayes_auto_learn_threshold_spam 6

Any help would be greatly appreciated as I have quite a busy mail server to retire.

Thanks,
Jonathan Skanes

Question information

Language:
English Edit question
Status:
Solved
For:
Ubuntu amavisd-new Edit question
Assignee:
No assignee Edit question
Solved by:
Jon Skanes
Solved:
Last query:
Last reply:
Revision history for this message
Jon Skanes (jon-skanes) said :
#1

OK, so I got off my ass and did it myself :) If you're interested, use at your own risk.
Something similar might make it into 2.6.2.

Re: [AMaViS-user] autolearn= not showing in headers on autolearned messages
From:
Jonathan Skanes <email address hidden>
  To:
<email address hidden>
  Date:
Yesterday 18:18:02

On Monday 14 July 2008 19:49:30 Jonathan Skanes wrote:
> On Sunday 13 July 2008 01:11:04 Jonathan Skanes wrote:
> > Hi all,
> >
> > I'm using an Amavis/SpamAssassin/Postfix setup on a fresh install of
> > Ubuntu Hardy.  I don't see any reference to the autolearn= field in any
> > of the mail headers Amavis is generating.  What am I missing?
> >
> > Here are the versions:
> >
> > amavisd-new:
> >   Installed: 1:2.5.3-1ubuntu3
> >   Candidate: 1:2.5.3-1ubuntu3
> >   Version table:
> >  *** 1:2.5.3-1ubuntu3 0
> >         500 http://gulus.usherbrooke.ca hardy/main Packages
> >         100 /var/lib/dpkg/status
> > spamassassin:
> >   Installed: 3.2.4-1ubuntu1
> >   Candidate: 3.2.4-1ubuntu1
> >   Version table:
> >  *** 3.2.4-1ubuntu1 0
> >         500 http://gulus.usherbrooke.ca hardy/universe Packages
> >         100 /var/lib/dpkg/status
> >
>
> So I got bored and started poking around in the code.  Can anyone see any
> issues with doing this?
>

I updated my changes to reflect the semantics of Spamassassin.  It returns
'autolearn=unavailable' if the message isn't scanned, ie. too big.

Use this at your own risk.  It seems to work well for me.

Oh, and cheers to the developers, I found the code easy to read :)  Feel free
to include this, with attribution, if you think others may find it useful.

--- /usr/sbin/amavisd-new-dist  2008-07-14 19:37:33.000000000 -0230
+++ /usr/sbin/amavisd-new       2008-07-15 17:54:49.000000000 -0230
@@ -9833,6 +9833,7 @@
     my($do_p0f) = $is_local && $os_fp ne '' &&
                $allowed_hdrs && $allowed_hdrs->{lc('X-Amavis-OS-Fingerprint')};
     my($tag_level, $tag2_level, $subject_tag, $pp_age);
+       my($autolearn_status) = ( $msginfo->supplementary_info('AUTOLEARN') || 'unavailable' );
     if ($allowed_hdrs && $allowed_hdrs->{lc('X-Amavis-PenPals')}) {
       $pp_age = $r->recip_penpals_age;
       $pp_age = format_time_interval($pp_age)  if defined $pp_age;
@@ -9896,10 +9897,11 @@
     #             : 0+sprintf("%.3f",$spam_level);  # trim fraction
     # my($bl) = !defined($boost) ? undef : 0+sprintf("%.3f",$boost);
     # (!defined($boost) || $bl==0 ? $sl : $bl>=0 ? $sl.'+'.$bl : $sl.$bl),
-      $full_spam_status = sprintf("%s,\n score=%s\n %s%s%stests=[%s]",
+      $full_spam_status = sprintf("%s,\n score=%s\n autolearn=%s\n %s%s%stests=[%s]",
         $do_tag2 ? 'Yes' : 'No',
         !defined($spam_level) && !defined($boost) ? 'x' :
                                          0+sprintf("%.3f",$spam_level+$boost),
+               $autolearn_status,
         !defined $tag_level || $tag_level eq '' ? ''
                                    : sprintf("tagged_above=%s\n ",$tag_level),
         !defined $tag2_level  ? '' : sprintf("required=%s\n ",  $tag2_level),
@@ -10428,6 +10430,7 @@
   my($tag_level_min,$tag2_level_min,$kill_level_min,$boost_max);
   my($spam_level) = $msginfo->spam_level;
   my(@q_addr,@qar_addr,@a_addr);  # per-recip quarantine address(es) and admins
+  my($autolearn_status) = ( $msginfo->supplementary_info('AUTOLEARN') || 'unavailable' );
   for my $r (@{$msginfo->per_recip_data}) {
     my($rec) = $r->recip_addr;
     my($blocking_ccat) = $r->blocking_ccat;
@@ -10515,9 +10518,10 @@
   my($sl) = !defined($spam_level) ? 'x' : 0+sprintf("%.3f",$spam_level); # trim
   my($bl) = !defined($boost_max) ? undef: 0+sprintf("%.3f",$boost_max);  # trim
   my($full_spam_status) = sprintf(
-    "%s,\n score=%s\n tag=%s\n tag2=%s\n kill=%s\n %stests=[%s]",
+    "%s,\n score=%s\n autolearn=%s\n tag=%s\n tag2=%s\n kill=%s\n %stests=[%s]",
     $do_tag2_any||$do_kill_any ? 'Yes' : 'No',
     (!defined($boost_max) || $bl==0 ? $sl : $bl>=0 ? $sl.'+'.$bl : $sl.$bl),
+       $autolearn_status,
     (map { !defined $_ ? 'x' : 0+sprintf("%.3f",$_) }
       ($tag_level_min, $tag2_level_min, $kill_level_min)),
     join('', $blacklisted_any ? "BLACKLISTED\n " : (),
@@ -10593,7 +10597,6 @@
   }
   if (ll(2) && $msginfo->is_in_contents_category(CC_SPAM)) {
     # log entry compatible with older log parsers
-    my($autolearn_status) = $msginfo->supplementary_info('AUTOLEARN');
     $s = $full_spam_status; $s =~ s/\n[ \t]/ /g;
     do_log(2,"SPAM, %s -> %s, %s%s%s",  $msginfo->sender_smtp,
              join(',', qquote_rfc2821_local(@{$msginfo->recips})),  $s,

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
AMaViS-user mailing list
<email address hidden>
https://lists.sourceforge.net/lists/listinfo/amavis-user
AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/howto/