Tagspam documentation Version 0.4 22 Aug 2005. SPF processing removed as it wasn't detecting any spam. CSV processing added based on David MacQuigg's pyCSV module, see: http://purl.net/macquigg/email/python Added ability to use Python license to make tagspam.py compatible with pyCSV Version 0.3 6 Feb 2004. Much updated version which adds SPF processing. (Sender Permitted From, details from http://spf.pobox.com/ ) Because SPF breaks SMTP forwarding, this is only thought suitable for tagging and whitelisting SPF pass messages at this stage. This uses the additional module checkspf.py and involves changing previous tagspam processing to make use the Tagspam class. This cleans up some nasty global variables which seem to belong better in an object namespace. Version 0.2 30 Nov 2003. Bugfix extends regular expression detecting origin IP address surrounded by round () parenthesis as well as square [] parenthesis, inserted by different trusted relay. Version 0.1 2 June 2003 first public release Author: Richard Kay Tagspam may be used on its own based on the terms of the GNU Public License, version 2 or greater see http://www.gnu.org/licenses/gpl.html to obtain a copy. Tagspam is also licensed when used either on its own or together with pyCSV under the same terms as Python. Any anti-spam program is likely also to result in loss of wanted messages. In other words, use entirely at your own risk ! Quick start =========== Until someone writes a script to automate this job on a number of likely target platforms there is no quick start. Use of this program without studying the source code, and careful editing of the files provided is more likely to result in wanted messages being tagged as spam, or even being lost altogether. At the very least I recommend that you read this file, edit the top part of the script tagspam.py containing global settings to suit your local operation and installation directory, and edit the incoming_relays file . About tagspam ============= Tagspam is a spam detector and tagging program intended for situations where mail messages can only be processed after having been seen by the MTA (Message Transport Agent) which receives mail for the entire domain (the part after the @ in the email address). tagspam allows use of DNS-based black lists (DNSBLs), and DNS-based white lists, which otherwise are only useable when the incoming MTA does the filtering. It is probably better in most cases for the MTA to do this job, but in some situations this option is not available. E.G. you are not in control of the MTA which listens for mail for your domain because you use virtual hosting and POP3 or IMAP to get your mail from your domain hosting provider, or if the reliability of your mail system depends upon a lower priority MX relay for your mail domain over which you do not have administrative control, and which is not able to reject based on the same DNSBL. Another situation where Tagspam might be useful is where you want to use or try out a more aggressive DNSBL than you are willing to use for rejects at the MTA level, and are willing to check tagged mail (which may be put into a different folder by a delivery agent such as procmail) manually and occasionally for false postive detects, which you are willing to see less frequently than mail which goes into your main inbox, but can't afford to have rejected unread. Tagspam can be used for tagging an individual user's mail. It tags the mail by adding the word: SPAM to the subject of messages which are relayed by a blacklisted host. Mail originating from SPF validated hosts and whitelisted hosts will not be tagged, even if these hosts are also (presumably incorrectly) blacklisted. Tagged mail can either be quickly identified as such by the end user, or can be thrown away or diverted into a seperate mailbox using a program such as procmail. Requirements ============ tagspam is written in Python, and should run on any computer with a Python interpreter installed. It has been used successfully on Linux, which means that it should work on any up-to date Unix. If you want to port it to other platforms with a configurable mail system allowing installation of active filters into the mail stream this should be possible. On my Linux system I use fetchmail to download the POP3 mail and sendmail to distribute it locally. This is probably just one of many feasible mail setups to use tagspam with. Tagspam is written as a mail reading and forwarding program. It reads a message on its standard input and forwards it using the Python smtplib module to the local address given as a command line parameter. You will probably need a POP3 or IMAP handler e.g. fetchmail, or some other POP3 handler together with sendmail, or another MTA (Mail Transfer Agent) such as exim or postfix etc. You will also need to be able to send email using Python's smtplib on your system. For this to work, if you don't have an MTA running on your computer you will need to either tell smtplib where to find an MTA to resend tagged mail to, or modify tagspam to deliver to the appropriate mailbox directly; if you do the latter then you will need to think about how you are going to prevent 2 instances of the tagspam program, (or tagspam and another mailbox writing program) from writing to the same mailbox at once if you don't want the kind of mess which can result from this. A computer running tagspam will need to be able to perform DNS queries. If commands such as dig or nslookup run on your system e.g. to resolve domains such as www.ibm.com to an IP address (4 numbers between 0 and 255 with dots between), then your system can perform DNS queries. Application notes ================= When suspected spam is tagged, the word: SPAM is inserted at the start of the Subject header. Tagged spam can then be rerouted using a suitable procmail filter, e.g, my .procmailrc file contains the following 3 lines: :0 * ^Subject:.?SPAM Mail/spam This causes the tagged spam to be rerouted from my in box into the spam mailbox folder, which can be checked very occasionally for false postives etc. If I decided to cast spam into the void never to be seen again, I could replace Mail/spam with /dev/null . These paths work using Unix mailbox folders as used by Mutt and Evolution. To aid debugging, an additional X-Presumed-Origin header is added to the message envelope, giving the IP address of the relay considered responsible for sending the message. An X-DNSBL-Matched header is added to state the first DNS white or black list domain root found in which the X-Presumed-Origin IP address is listed. An X-CSV-Result header is added to indicate the results of CSV processing. If the Python class used to parse the email throws an exception, the exception trace is added to the start of the message body, prior to the message going to the normal destination. If you get one of these the postmaster address is used as the sender, because if the message is unparseable, there is no guarantee of you being able to obtain a From address. Installation ============ 1. Unpacking Decide where you want to install tagspam. I have chosen /usr/local as a good place. As root do the usual unpack here: cp tagspam.tgz /usr/local cd /usr/local tar xzvf tagspam.tgz cd tagspam ls -l the directory listing should look something like this: -bash-2.05b$ ls -l total 72 -rwx--x--x 1 rich rich 15786 Jan 30 17:27 checkspf.py* -rw-r--r-- 1 rich rich 654 Feb 6 21:37 dnsbl_domains -rw-r--r-- 1 rich rich 649 Feb 6 20:46 dnswl_domains -rw-r--r-- 1 rich rich 1288 Feb 6 21:36 incoming_relays -rw-r--r-- 1 rich rich 11892 Feb 6 21:57 readme.txt -rwxr-xr-x 1 rich rich 224 Jan 30 16:36 rmtags* -rwxr-xr-x 1 rich rich 13063 Feb 6 19:47 tagspam.py* 2. the following files will need to be edited: 2.1 incoming_relays You will need to add IP addresses of relays normally in your mail delivery path, e.g. if your mail is obtained by POP3 or IMAP these will be the IPs of these incoming mail servers. Add one address per line. This file must always include the loopback address 127.0.0.1, in case the computer on which you run this program is also relaying, e.g between a POP3 handler and a MTA. An origin prior to any of these known acceptable IP addresses will be traced using the Received: message path tracing headers. 2.2 dnsbl_domains Add one domain name for each DNS black list domain root which you want to lookup in order to check whether the last non-local relay is blacklisted. Put each domain on a seperate line. 2.3 dnswl_domains Add one domain name for each DNS white list domain root which you want to lookup, such that if the origin relay is whitelisted the message should always be delivered regardless. Put each domain on a seperate line. 2.4 editing source code tagspam.py You will need to modify the first line of tagspam.py to suit the location of the Python interpreter on your system. Use of the Unix command: which python should give the correct path. You will then need to edit some global variables to suit how you have installed the package and its files. ## globals you will or may need to edit # file locations. Change this to where you have installed tagspam install_dir='/usr/local/tagspam/' # full message From address if error parsing headers # change the line below to include your own domain postmaster='postmaster@whatever.your-domain.is' # change the next line if you want a different tag tag='SPAM' # the tag to add to the Subject header 3. Installation into mail feed. tagspam.py reads a message on its standard input and sends the mail on to the (typically) local name specified as program parameter 1. I installed tagspam.py and its files into /usr/local/tagspam . I then installed a line into /etc/aliases to work with sendmail rich-filter: "|/usr/local/tagspam/tagspam.py rich" and redirected POP3 mail for rich to go to rich-filter . Obviously you will have to replace rich with whatever address you use for resending. Other unix MTA's tend also to use sendmail's /etc/aliases but you will need to check that they can execute programs piping messages to standard input in a similar manner. 4. Telling sendmail you want it to run tagspam If you are running smrsh (sendmail restricted shell) you will need to add a link into /etc/smrsh to programs run within your mail system. If you use an up-to date version of sendmail you will need to add a link to tagspam.py into /etc/smrsh e.g. as root: cd /etc/smrsh ln -s /usr/local/tagspam/tagspam.py tagspam.py 5. Testing Testing You must do this to reduce the risk of things going badly wrong. As an ordinary user change to your installation directory and import tagspam as a Python module. In the following test run I use the testing message hardcoded into the tagspam.test() function with origin IPs of 127.0.0.2 to test a whitelisted entry, 127.0.0.4 to test a blacklisted entry and 1.2.3.4 to test an unlisted entry. cd /usr/local/tagspam python Python 2.2.1 (#1, Dec 4 2002, 23:43:31) [GCC 3.2 (Mandrake Linux 9.0 3.2-1mdk)] on linux-i386 Type "help", "copyright", "credits" or "license" for more information. >>> import tagspam >>> tagspam.test() try whitelisted, not listed and only blacklisted IPs enter test origin IP e.g. 127.0.0.2 : 127.0.0.2 >>> tagspam.test() try whitelisted, not listed and only blacklisted IPs enter test origin IP e.g. 127.0.0.2 : 127.0.0.4 >>> tagspam.test() try whitelisted, not listed and only blacklisted IPs enter test origin IP e.g. 127.0.0.2 : 1.2.3.4 >>> Next check that the mail got through, and the blacklisted entry was suitably tagged. If this all worked correctly check your mail carefully using remote test messages to build up your confidence in your modified mail system.