2024-11-15
Personalized voice recordings by Elwood "You've got mail!" Edwards
2023-07-05
How to beat an adaptive/Bayesian spam filter (2004)
That was the title of my talk at the 2004 MIT Spam Conference on January 16, 2004. As I recently recovered the slides I am creating this blog for posterity.
The core of the talk was that it was possible to take one machine learning spam filter and use another identical one to learn the characteristics of the other. That way one machine learning system would fight spam and the other would automatically identify the other's weaknesses. Thus a machine learning algorithm could learn how to write spam that would get through a tuned machine learning spam filter. This is now referred to as "Adversarial Machine Learning".
The talk also point out that spammers were trying a technique dubbed "Word Salad" to include random words to try to evade filtering.
Slides are here as a PDF and embedded below as images.
2023-05-24
Bringing the POPFile web site back from the dead
Over 20 years I wrote some code to scratch an itch. The specific itch was that I was getting a lot of email and I wanted it to be automatically sorted into folders. At work we were using Microsoft Outlook and I figured there had to be a way to do this with machine learning.
I got into a discussion with Brent Welch at work and he pointed me to an extension for exmh called ifile. The chap who wrote ifile, Jason Rennie, had also written a paper about it and I read that. It describes using naive Bayesian text classification to sort email. Just what I was looking for. Except I was using Microsoft Outlook.
So, in around 2000 I wrote a Visual Basic extension for Microsoft Outlook that did exactly the same sort of classification and automatically learned the right categories by observing the folder structure and when mail was moved from one folder to another. The user literally did nothing but sort out mail that wasn't in the right spot.
This was... AutoFile. The Visual Basic code wrapped the bow toolkit from CMU. I've made the code (from a 2002 attempt to make this into a shareware program) available here.
But AutoFile was Microsoft Outlook only and relied on someone else's machine learning toolkit. To make things really easy to use I created POPFile in 2001. POPFile intercepted a mail program downloading mail via POP3 (and later IMAP) and performed classification. It added a header or altered the subject line and so it was possible to use simple filters to get mail sorted into folders automatically.
To make it possible to reclassify badly sorted mail I built a web-based user interface (which was somewhat of a novelty in 2001!) which looked like this:
And I decided to make it open source (I was active in Steve Gibson's newsgroups and a lot of early POPFile testers were from there), and forget about shareware.
![]() |
| The opening for my presentation the second year I was invited. It started with a custom version of All your base are belong to us. Sadly, the videotape on which it was recorded has been lost but a silent version is here. |































