Originally published on: Thu, 23 Apr 2009 00:52:32 +0000
I had been experiencing an excess of spam in one of my POP3 accounts. I don't generally use any kind of filters, Bayesian or otherwise to sift out the ham from the spam. I usually do it all manually.
The number of junk items I had been receiving was just too great. I needed to automate the removal of the unwanted e-mail from my POP3 inbox.
At the same time, I really needed a programming task that I would implement in Ruby as an exercise for me to try out the language. I decided to write my POP3 filter, poppy, in Ruby. My experience with Ruby at that point had been minimal.
I had decided to first look at the standard Ruby POP3 class to see if it had most of the functionality I needed. Well, it had most ( at the time ) but not all; I needed to be able to issue the POP3 TOP command for each message. The TOP command will cause the POP3 server to send back a series of lines comprising the POP3 headers for a specified message and any additional lines that one would desire. I had intended to issue a TOP messagenum 0 command to ask the POP3 server for only the headers and zero lines of the e-mail body.
Instead of using the POP3 class, I wrote the full interaction with the POP3 server in Ruby just using a socket connection.
I first issued a USER and PASS sequence to log in, then a STAT command to see how many messages were waiting. After the STAT command, I then issued a LIST command to obtain the size of each message.
I then iterated through each item and performed the TOP msgnum 0 operation.
After receiving the data from the TOP command, I sifted out the pertinent ( to me ) headers ... to, from, reply-to, and subject ... forced them to lower-case and then passed them to a method that would determine whether or not the message was spam. I also passed in the size of the message.
In the method isItSpam(), poppy first looks for some definite signs of spam based on known spam in my mailbox. If any of the signs appear, the code returns true indicating that the item is spam.
The next set of conditionals act as a whitelist and look to see if the e-mail is coming from a friend ( "remmy", in this case. ) If the e-mail is from a friend, false is returned to flag the message as ham.
The next portion of code checks the size of the item. If they're not on my whitelist, I ignore e-mail items larger than 40,000 bytes.
If we reach the end of the method, I give the e-mail the benefit of the doubt and return false allowing it to pass.
See poppy.rb below:
You'll need to make a few changes to the code in order for it to function. The very last line of the code is a call to removeSpam(). You will need to alter the code to pass in your own POP3 server name, port, id, and password.
In order to actually delete the items from the POP3 server, you'll find three lines that you'll want to uncomment:
You might wish to leave them commented out until you fine-tune your spam/ham criteria.
As a failsafe the code displays the headers of the item that it's deleting. If your version of the code accidentally deletes an e-mail that you'd wanted to keep, you can use the from, reply-to, or subject data to e-mail the original sender and request another copy of the message.
Learning the basics of Ruby to solve the above problem was a very pleasant experience. I've showed the above code to a number of friends who don't code in Ruby ( some don't code any more ). Most of them have stated how clean and easy-to-read the code appears. I agree!
Unless otherwise noted, all code and text entries are Copyright ©2009 by James K. Lawless
Views expressed in this blog are those of the author and do not necessary reflect those of the author's employer. Views expressed in the comments are those of the responding individual.

Save to StumbleUpon
Digg it
Save to Reddit
Share on Facebook
Share on Twitter
More bookmarks
Click **here**
A JavaScript REPL for Android Devices
A Review of Kevin Mitnick's Book Ghost in the Wires
Play MP3 Files with Python on Windows
Compiling Rhino JavaScript to Java
A Simple Parser for a Small Command Line Interface
Learning Z-80 Assembly Language on the TRS-80
Why Some Web Sites will go Dark on Jan 18th
Book Review : Paull Allen - Idea Man
A 90's Experiment in Online Systems - The U.S. West CommunityLink Service