Spam Reduction Notes
These are notes from a talk given at the
Tokyo Linux Users Group
technical meeting on January 18th, 2003. It was intended
as a survey of the current techniques that you might like
to try... with an emphasis of what works for me.
Chapman: | Have you got anything without spam?
|
Jones: | Well, there's spam egg sausage and spam, that's not got much spam in it. |
Chapman: | I don't want ANY spam! |
Monty Python's Flying Circus, Episode 25, June 25, 1970 |
What is SPAM?
SPAM is a registered trademark of
Hormel Foods, LLC, for luncheon meat.
What is spam?
Start again...
- typical definition
- Unsolicited Commercial Email (UCE)
- I prefer to stress the untargeted, bulk nature of the mail
- unsolicited, automated, Email
Escalating problem
Nov 2002: | Brightmail says 36% of traffic is spam |
Jan 2003: | MessageLabs predicts spam exceeds ham by July |
http://www-106.ibm.com/developerworks/library/lol/spamato/
- very effective
- trivial setup
- extremely low false sorting rate
- undo
Multiple Email Addresses
- suffixes common with many modern MTA/MDAs
- throw-away (free) accounts
- effective
- easy with your own mail server
- good for the curiosity factor, use a different
address or suffix and you know where the spammer got your address
ad hoc content filtering
- procmail
- Exim filter files
- system filter
- user filter
- competitive... you against the spammer
- can rapidly get out of hand
- cooperative effort many smart people against the spammer
- whitelists and blacklists
- rule-based filtering
- recent versions also include Bayesian filtering
Blacklists/Whitelists
- identify good guys and bad guys
- don't use for filtering... use them to reduce computational load
- can be implemented in server or client tools
Exim ACLs
# Accept mail to postmaster or abuse in any local domain,
# regardless of source.
accept local_parts = postmaster:abuse
domains = +local_domains
# reject if the sender is a known spammer.
deny senders = @@cdb;/etc/exim/spam-domains.cdb : \
cdb;/etc/exim/spam-addresses.cdb
message = message from spammer rejected
# Deny unless the sender address can be verified.
require verify = sender
Public (realtime) Blacklists
# DNS blacklists.
# There are a variety of realtime blacklists that attempt to
# identify spam sources, open relays, and even dialup address
# blocks. You really need to check the policy published by
# each list before deciding to use it.
deny dnslists = list.dsbl.org : \
relays.ordb.org : \
spews.relays.osirusoft.com
message = rejected because $sender_host_address is in the blacklist at $dnslist_domain\n\
($dnslist_text)
TMDA
Tagged Message Delivery Agent
- whitelist
- blacklist
- confirmation system
- suffix (tag)
Stamps
Make it comutationally expensive for spammers (or at least people
not on your whitelist) to send you mail.
Vipul's Razor (AKA Spamnet)
- cooperative effort
- 20 character SHA digest sent to local catalog server
- Mad-lib problem
- recent performance problems
automated filtering
- Techniques
- tokenize (punctuation?, HTML? stopwords?)
- combine
- score
A Plan for Spam
Graphs from SpamBayes background information
Robinson Combining
http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html
SpamBayes Chi-Squared
produce two numbers
- H - ham (good) probability
- S - spam probability
- user can decide thresholds depending on false negative, false positive sensitivity in classifying "unsures"
- Stupid beats smart.
- avoid magic constants
- build rules into tokenizer, not analyzer policy
Chi-squared distribution explanation
Using SpamBayes
- hammiefilter with procmail
- POP3proxy
- hammiesrv
- Outlook2000 Plugin
Mailing List Applications?
Will filters end spam?
- If you can successfully filter -- is it worth sending?
- Will the gullible filter?
Tools in other domains
- CRM114 for syslog monitoring
look for outlying events
Tools
- Server Based
- Client Based
- Fixed Rule Based
- Probability (Trained) Based
Spam Conference
Watch the video archives of the experts speaking on the subject
at the Spam Conference held at MIT on January 17th, 2003.
http://www.spamconference.org/