Why am I getting so much spam?

I posted an answer to this question on Monday on the Webmail blog.  However, something is broken with the feedburner RSS feed on our site.

Anti-spam has been one of my areas of focus for several years, and something that I have enjoyed.  Recently though, spam fighting has gotten pushed to the side with all of the other things that I am responsible for.  It is painful to watch spam slip by the filters and I am definitely excited to have Mike T working on this project.

The first thing Mike is going to do is take our existing whitelist/blacklist functionality and push it up to the Postfix level, rather than the amavisd + spamassassin level.  This will ensure that system-wide rules don’t override individual user or domain rules.  The second thing he is going to do is give our customers the ability to whitelist/blacklist IP addresses, in addition to just sender email addresses.  Next, Mike will be building a system that makes "whitelist/blacklist/greylist/unknown" decisions based on aggregate whitelist/blacklist data, sending history, mail volumes and third party sender reputation databases.  This will be some really powerful stuff.  After that, he will do some work on the content filters behind this sender reputation system, possibly incorporating DSPAM which learns from the spam that each individual receives.

Wedding Photos

Finally, here are our wedding photos and honeymoon pictures.  You will need to create a snapfish.com account to view them.  (don’t worry, they’ve never spammed my email address)

There are three albums:

Wedding-pro – professional photos
Wedding-am – amateur phtos (from the cameras on the tables)
Honeymoon – Mazatlan, Mexico

Beth and I were very happy with our photographer, Roman Grinev.  I highly recommend him if you are getting married in Northern Virginia.

Check out my gut in the main photo on the login page.  I guess thats what the first couple days of married life will do to you.

Watch what you put on Red Hat ES4

We put some additional spam filtering servers online yesterday, and we figured we’d bump up to Red Hat ES4 (Linux 2.6.9-patched).  We had been running Red Hat ES3 (Linux 2.4.21-patched) for all of our prior spam filtering servers.  But the performance on these new ES4 servers sucked under a high load.

At first I noticed from vmstat that it was swapping a bunch of application memory when it didn’t need to, but I fixed that by:   echo 0 > /proc/sys/vm/swappiness

It also seemed that it was swapping our tmpfs partitions when there was plenty of memory, which ES3 didn’t do.  But even when turning swapoff entirely and using less than 50% of our RAM, once there were over 100 Postfix smtpd processes receiving mail, the server’s load average sky-rocketed and the server became unresponsive. Our ES3 servers handle the same load all day long with no problem.

I also noticed from iostat that the ES4 servers did a lot of reads from our root partition, which our ES3 servers do not do. I don’t know why it would do that because we use the exact same drive and config and software versions on both machines.  And plus, Postfix and amavisd/spamassassin are supposed to keep all their config data in memory.

The strangeness got stranger when I stopped either amavisd or Postfix.  With Postfix accepting incoming mail on port 25 and amavisd stopped, the reads on the root partion vanish.  With Postfix’s port 25 blocked (no incoming mail) and amavisd crunching on queued mail, the reads on the root partition vanish.  However with both running, there are heavy reads on the root partition… WTF!

Anyways, after spending more than a day on it I just wanted to let you know that ES4 sucks and we have rebuilt these machines using our trusty ES3 image and they are now operating great.

Moral

Another quote on my wall

Just added…

"A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources added.

An always-on service is said to be scalable if adding resources to facilitate redundancy does not result in a loss of performance."

– Werner Vogels, CTO – Amazon.com

Here is Werner’s full post:
http://www.allthingsdistributed.com/2006/03/a_word_on_scalability.html

Courier –> Dovecot

I mentioned back in February that we are switching our POP3/IMAP proxy software from Perdition to Dovecot.  These proxy servers are still in beta, because there is a bug with how Dovecot handles SSL connections.  Timo (Dovecot’s author) attempted a fix a few weeks ago, but the fix introduced new problems, so we reverted back.  I am hoping to get the final bugs resolved within the next couple of weeks so that we can release this out of beta.

In the mean time, we have been hard at work upgrading our backend IMAP software – also to Dovecot.  Currently we run Courier-IMAP, but Courier does not handle large mail folders efficiently.  Webmail, unlike desktop clients, does not have its own cache, so it relies on the IMAP server to obtain header listings and to perform sorts and searches.  Courier lacks indexes that would make these operations fast.  Instead it must open every message file and parse out the header information in order to return a sorted list of emails back to webmail.  Dovecot on the other hand, makes heavy use of indexes.  The indexes allow a folder with 10,000 messages to be sorted in less than a second, whereas Courier would take 30-60 seconds or even longer, and usually cause a timeout.  The speed difference is amazing.

In order to make the switch seamless, we have configured Dovecot to run in parallel with Courier on the existing mailbox servers.  We patched Dovecot to utilize Courier’s folder subscription and message UID lists, so that both systems can utilize the same maildirs.  If you are interested in these patches, shoot me an email and I will send them to you.

Webmail will be the first application to switch to Dovecot.  We began rolling this out server-by-server on Wednesday.  We are taking our time with this rollout since it is such a big change, just in case any unforeseen problems arise.  So far the issues have been minor, and easily corrected.

Once the beta proxy is bug-free, we will start migrating the front-end POP3 and IMAP systems to Dovecot.  Needless to say, we are making a big commitment to this Dovecot thing.  It will be at the core of what’s to come.

An excuse to tailgate

Finally, four and a half months after the end of a very successful 2005 tailgate season – VT football is back.  This Saturday (the 15th) Virginia Tech will take on themselves in the annual Maroon-White spring football game.  The game itself does not mean much, but it gives us fans an opportunity to get in valuable tailgating practice in preparation for the 8 home-game season ahead.

We will be doing it up in style, starting at 8:00 AM and going until they make us leave.  Come join us and bring a friend.

We will be located in Lot 4 for this game, in the first spaces on the right as you enter the lot.  Lot 4 is at the intersection of Southgate Drive and Spring Road.  Check out this parking map.  If you get there early, you may be able to park with us… if not, park in Lot 5 or Lot 6 and walk up.  Look for our flags.

We’ll have plenty of beer and wine… if you want something else, bring it.

We’ll be grilling breakfast and lunch (and dinner?)… let me know if you plan to come so we can purchase enough food.

Go Hokies!