Jim Lawless' Blog


E-mail cleansing

Originally published on: Thu, 23 Apr 2009 00:52:32 +0000

I had been experiencing an excess of spam in one of my POP3 accounts. I don't generally use any kind of filters, Bayesian or otherwise to sift out the ham from the spam. I usually do it all manually.

The number of junk items I had been receiving was just too great. I needed to automate the removal of the unwanted e-mail from my POP3 inbox.

At the same time, I really needed a programming task that I would implement in Ruby as an exercise for me to try out the language. I decided to write my POP3 filter, poppy, in Ruby. My experience with Ruby at that point had been minimal.

I had decided to first look at the standard Ruby POP3 class to see if it had most of the functionality I needed. Well, it had most ( at the time ) but not all; I needed to be able to issue the POP3 TOP command for each message. The TOP command will cause the POP3 server to send back a series of lines comprising the POP3 headers for a specified message and any additional lines that one would desire. I had intended to issue a TOP messagenum 0 command to ask the POP3 server for only the headers and zero lines of the e-mail body.

Instead of using the POP3 class, I wrote the full interaction with the POP3 server in Ruby just using a socket connection.

I first issued a USER and PASS sequence to log in, then a STAT command to see how many messages were waiting. After the STAT command, I then issued a LIST command to obtain the size of each message.

I then iterated through each item and performed the TOP msgnum 0 operation.

After receiving the data from the TOP command, I sifted out the pertinent ( to me ) headers ... to, from, reply-to, and subject ... forced them to lower-case and then passed them to a method that would determine whether or not the message was spam. I also passed in the size of the message.

In the method isItSpam(), poppy first looks for some definite signs of spam based on known spam in my mailbox. If any of the signs appear, the code returns true indicating that the item is spam.

The next set of conditionals act as a whitelist and look to see if the e-mail is coming from a friend ( "remmy", in this case. ) If the e-mail is from a friend, false is returned to flag the message as ham.

The next portion of code checks the size of the item. If they're not on my whitelist, I ignore e-mail items larger than 40,000 bytes.

If we reach the end of the method, I give the e-mail the benefit of the doubt and return false allowing it to pass.

See poppy.rb below:


# License: MIT / X11
# Copyright (c) 2009 by James K. Lawless
#
# Permission is hereby granted, free of charge, to any person
# obtaining a copy of this software and associated documentation
# files (the "Software"), to deal in the Software without
# restriction, including without limitation the rights to use,
# copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following
# conditions:
#
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
# OTHER DEALINGS IN THE SOFTWARE.

   require 'socket'

   def removeSpam(server,port,user,password)
      sock=TCPSocket.new(server,port);

      ["","USER " + user, "PASS " + password, "STAT",
      "QUIT" ].each do |msg|
         if msg.length != 0
            sock.send(msg+"\r\n",0)
         end
         s=sock.recv(4000)
         if s.downcase.split(" ")[0] != "+ok"
            return false
         end
         if msg=="STAT"
            lookThroughEmails(sock,s)
         end
      end
      sock.close
   end

   def lookThroughEmails(sock,status)
      numOfEmails = status.split(" ")[1]
      for i in 1..numOfEmails.to_i
         sock.send("LIST " + i.to_s + "\r\n",0)
         s=sock.recv(4000)
         sz=s.split(" ")[2].to_i
         if checkForSpam(sock,i,sz)
            # to actually delete the entry,
            # uncomment the three lines below
            # sock.send("dele " + i.to_s + " \r\n",0)
            # s=sock.recv(4000)
            # puts s
            puts "(deleted)"
         else
            puts "(kept)"
         end
      end
   end

   def checkForSpam(sock,i,sz)
      sock.send("TOP " + i.to_s + " 0 \r\n",0)
      flag = 0
      subject=from=to=replyTo=""
      while flag == 0 do
         s=sock.recv(8000)
         s.split("\r\n").each do |lin|
            if lin == ""
               flag = 1
            else
               word = lin.downcase.split(" ")[0]
               if word == "subject:"
                  subject=lin.downcase
               elsif word == "from:"
                  from=lin.downcase
               elsif word == "to:"
                  to=lin.downcase
               elsif word == "reply-to:"
                  replyTo=lin.downcase
               end
            end
         end
      end
      puts "------------"
      puts "# " + i.to_s
      puts to
      puts from
      puts replyTo
      puts subject
      puts "(" + sz.to_s + ")"

      return isItSpam(sock,i,sz,to,from,replyTo,subject)
   end


   def isItSpam(sock,i,sz,to,from,replyTo,subject)
# all values have been forced to lower-case
         # merge the original from and the reply-to
      myFrom=replyTo + " " + from
         # known problem-children. Get rid of them now!
      if subject.index("drugs") != nil then
         return true
      end
      if to.index("someone_else") != nil then
         return true
      end
      if myFrom.index("someSpammer") != nil then
         return true
      end


         # whitelist my friends
      if myFrom.index("remmy") != nil then
         return false
      end

         # if someone other than on my whitelist sent me something bigger than
# 40,000 bytes, nuke it.
      if(sz>40000) then
         return true
      end

         # they've made it this far.
# Let's give them the benefit of the doubt
      return false
   end


   removeSpam("pop3-server-name",110,"your-pop3-id",
      "your-pop3-password")

You'll need to make a few changes to the code in order for it to function. The very last line of the code is a call to removeSpam(). You will need to alter the code to pass in your own POP3 server name, port, id, and password.

In order to actually delete the items from the POP3 server, you'll find three lines that you'll want to uncomment:


            # to actually delete the entry,
            # uncomment the three lines below
            # sock.send("dele " + i.to_s + " \r\n",0)
            # s=sock.recv(4000)
            # puts s

You might wish to leave them commented out until you fine-tune your spam/ham criteria.

As a failsafe the code displays the headers of the item that it's deleting. If your version of the code accidentally deletes an e-mail that you'd wanted to keep, you can use the from, reply-to, or subject data to e-mail the original sender and request another copy of the message.

Learning the basics of Ruby to solve the above problem was a very pleasant experience. I've showed the above code to a number of friends who don't code in Ruby ( some don't code any more ). Most of them have stated how clean and easy-to-read the code appears. I agree!

Unless otherwise noted, all code and text entries are Copyright ©2009 by James K. Lawless

del_icio_us Save to del.icio.us
stumbleupon Save to StumbleUpon
digg Digg it
reddit Save to Reddit
facebook Share on Facebook
twitter Share on Twitter
aolfav More bookmarks



Previous post: Obfuscated C
Next post:Cheating the LZW


Search this Blog (and site)

Search this Site with PicoSearch


Subscribe to this Blog

 Subscribe!


Contact Me

Email: jimbo@radiks.net


Follow me on Twitter

http://twitter.com/lawlessGuy


Recent Posts

Mad Schemes : Learning Lisp via SICP

Auto Save Clipboard Images Redux

Extending SpiderMonkey JavaScript on Windows

Rhino JavaScript to EXE with launch4j

Compiling Rhino JavaScript to Java

Directory Traversal in Rhino JavaScript

Taking Shape

We've Moved!


Popular Posts

A Command-Line MP3 Player for Windows

Auto Save Images from the Clipboard

Java in a Windows EXE with launch4j

An Interview with Tom Zimmer: Forth System Developer

Setting Windows Console Text Colors in C


Random Posts

Removing IE Popups in C

My Big Shareware Splash

Envy

A Command-Line CD Tray Opener

A Scrolling Banner using Canvas and JavaScript

Twimmando No More

A DSL in JavaScript

Generating Primes with XSLT

Yet Another Enhanced Echo Command

A Command-Line MP3 Player for Windows


Full List of Posts

http://www.mailsend-online.com/bloglist.htm


Blogroll

MicroISV on a Shoestring
DadHacker
The Bottom Feeder
Writin' That Code!
The Recursive ISV
The Thomsen Blog
Prototypically Speaking
The Reinvigorated Programmer