Originally published on: Thu, 23 Apr 2009 00:52:32 +0000
I had been experiencing an excess of spam in one of my POP3 accounts. I don't generally use any kind of filters, Bayesian or otherwise to sift out the ham from the spam. I usually do it all manually.
The number of junk items I had been receiving was just too great. I needed to automate the removal of the unwanted e-mail from my POP3 inbox.
At the same time, I really needed a programming task that I would implement in Ruby as an exercise for me to try out the language. I decided to write my POP3 filter, poppy, in Ruby. My experience with Ruby at that point had been minimal.
I had decided to first look at the standard Ruby POP3 class to see if it had most of the functionality I needed. Well, it had most ( at the time ) but not all; I needed to be able to issue the POP3 TOP command for each message. The TOP command will cause the POP3 server to send back a series of lines comprising the POP3 headers for a specified message and any additional lines that one would desire. I had intended to issue a TOP messagenum 0 command to ask the POP3 server for only the headers and zero lines of the e-mail body.
Instead of using the POP3 class, I wrote the full interaction with the POP3 server in Ruby just using a socket connection.
I first issued a USER and PASS sequence to log in, then a STAT command to see how many messages were waiting. After the STAT command, I then issued a LIST command to obtain the size of each message.
I then iterated through each item and performed the TOP msgnum 0 operation.
After receiving the data from the TOP command, I sifted out the pertinent ( to me ) headers ... to, from, reply-to, and subject ... forced them to lower-case and then passed them to a method that would determine whether or not the message was spam. I also passed in the size of the message.
In the method isItSpam(), poppy first looks for some definite signs of spam based on known spam in my mailbox. If any of the signs appear, the code returns true indicating that the item is spam.
The next set of conditionals act as a whitelist and look to see if the e-mail is coming from a friend ( "remmy", in this case. ) If the e-mail is from a friend, false is returned to flag the message as ham.
The next portion of code checks the size of the item. If they're not on my whitelist, I ignore e-mail items larger than 40,000 bytes.
If we reach the end of the method, I give the e-mail the benefit of the doubt and return false allowing it to pass.
See poppy.rb below:
You'll need to make a few changes to the code in order for it to function. The very last line of the code is a call to removeSpam(). You will need to alter the code to pass in your own POP3 server name, port, id, and password.
In order to actually delete the items from the POP3 server, you'll find three lines that you'll want to uncomment:
You might wish to leave them commented out until you fine-tune your spam/ham criteria.
As a failsafe the code displays the headers of the item that it's deleting. If your version of the code accidentally deletes an e-mail that you'd wanted to keep, you can use the from, reply-to, or subject data to e-mail the original sender and request another copy of the message.
Learning the basics of Ruby to solve the above problem was a very pleasant experience. I've showed the above code to a number of friends who don't code in Ruby ( some don't code any more ). Most of them have stated how clean and easy-to-read the code appears. I agree!
Unless otherwise noted, all code and text entries are Copyright ©2009 by James K. Lawless
Save to del.icio.us
Save to StumbleUpon
Digg it
Save to Reddit
Share on Facebook
Share on Twitter
More bookmarks
Subscribe!
Auto Save Clipboard Images Redux
Extending SpiderMonkey JavaScript on Windows
Rhino JavaScript to EXE with launch4j
Compiling Rhino JavaScript to Java
Directory Traversal in Rhino JavaScript
A Command-Line MP3 Player for Windows
Auto Save Images from the Clipboard
Java in a Windows EXE with launch4j
An Interview with Tom Zimmer: Forth System Developer
Setting Windows Console Text Colors in C
A Scrolling Banner using Canvas and JavaScript
Yet Another Enhanced Echo Command
A Command-Line MP3 Player for Windows
MicroISV on a Shoestring
DadHacker
The Bottom Feeder
Writin' That Code!
The Recursive ISV
The Thomsen Blog
Prototypically Speaking
The Reinvigorated Programmer