Tag Archives: webmail

Switching Upgrade

In the early AM hours on Saturday morning we will be making a change to our switch configuration at Rackspace.  Currently we have four racks of servers at that data center – 62 machines and counting.  Our uplinks connect into a firewall/load-balancer on rack #1 and another on rack #2, both of which are then connected to our backend private network via interconnected switches on each of the four racks.

Racks #1 and #2 each have a 24-port gigabit switch (Cisco 2970), and racks #3 and #4 each have a 24-port 10/100 switch with 2 gigabit uplink ports (Cisco 2950).  Racks #2, #3 and #4 each connect to rack #1’s 2970 via their gigabit uplinks.

Now here is the problem we are solving this weekend…  Every time we add a new rack of servers we have to pull one server out of either rack #1 or rack #2 and move it to the new rack, so that we can free up a gigabit port on the Cisco 2970 for the new rack’s switch to plug into.  That’s just a pain.  And at the rate we’re growing, rack #1 and rack #2 will eventually become completely empty 🙂

So, we are moving the gigabit switches to a layer above all of the racks, and each rack will now plug into these external switches – creating a pyramid layout that will scale to 48 racks (and beyond with more gigabit switches).  After the maintenance, all of the rack switches will be 10/100 and the gigabit switches will be dedicated strictly to rack aggregation and hosting the firewall/load-balancer ports.

We are planning on just a few minutes of downtime for this upgrade and some latency while we verify connectivity and failover traffic from secondary to the primary firewall.  This will happen between 1:00am and 5:00am Saturday, January 7 as reported on our system status RSS feed.

How we Reject Mail using Blacklists (RBLs)

Today after I posted links to our new Spam Filtering Troubleshooting Tools, I received an email raising concern about our use of the controversial SPEWS blacklist.  Here was my response:

> I agree with you. SPEWS is a very unreliable RBL to use to block mail.
> We don’t use in SPEWS that way. We only use it as part of the
> weighting system. It takes 6 points for an email to be tagged as spam
> (or 8 points if you set your filter level to low). The SPEWS scores are
> very low in comparison to the rest of the RBLs that we use. There are
> two SPEWS lists and we use them as follows…
>
> RCVD_IN_SPEWS1 Received via a relay in l1.spews.dnsbl.sorbs.net 0.701
> RCVD_IN_SPEWS2 Received via a relay in l2.spews.dnsbl.sorbs.net 0.301
>
> SPEWS works very well as part of a weighting system. It is a good
> indicator of exactly what it says it’s purpose is for – “spam early
> warning”.

A great feature of our system is that we never reject SMTP connections based solely on any single RBL.  IPs must be listed in multiple RBLs or have additional spam checks fail in before we will reject the SMTP connection.  And the SPEWS RBL is not part of that equation.

Spam Filtering Troubleshooting Tools

Here is a Christmas present from Korey.  I hope you find it useful:

http://www.webmail.us/misc/blagr.php

Provides a window into our dnscache + rbldnsd system.  You can see which third-party spam blacklists use (we rsync them locally), and you can do lookups on IP addresses and domains that you think might be listed.

http://www.webmail.us/misc/spamrules.php

Provides a complete list of our spam filtering rules, along with description and score.  This list is auto-updated daily.

1.6% virus – Is that all?

The virus stats on our home page show that 1.6% of all emails contain a virus.  I know that number isn’t right because we’re only counting emails that actually make it to our virus scanners, and we have Postfix rules that reject a ton of viruses before it ever makes it that far.  The real virus numbers are much higher.

Postini shows a ridiculous spike in virus traffic over the past month.  Check out the “Viruses > Last Six Months” tab on their stats page.  I’d like to say that I agree with Postini’s numbers based on the rejects I’ve seen at the SMTP-level since Sober.U and Sober.Y hit the Internet.  But since our stats don’t take SMTP rejects into account, it is difficult to say exactly what real virus traffic we’re seeing.

I bet we can fix that 🙂

The Right Way to use SPF

Pat forwarded me a suggestion from a customer today and I
figured I’d discuss it here, since this is a common misconception about SPF…

> I would like to see SPF-enhanced white listing.  If the e-mail
> passes the SPF check, white list the e-mail message and
> continue delivery. If the e-mail message fails the SPF check,
> pass it through the normal SPAM checks and process the e-mail
> message like any other e-mail received.

Many spammers publish SPF records for their domains too, and they send their spam from the mail servers listed in their SPF.  By whitelisting all mail that passes SPF checks, we would be allowing a lot of spam in.  I just did a search of my spam folder and 91 spam emails passed the SPF checks, meaning the spam domain has published SPF records and the spam email was sent from that domain’s legitimate servers.

SPF is not designed for whitelisting. 

Rather, SPF is designed to prevent phishing and other forgeries.  If an email is sent from a server that is not listed in a domain’s SPF record, we can assume it to be a forgery and either tag it as spam or discard it.  This is really the only safe way to use SPF, and this is how we use it currently.

Here are the SPF scores we are using:

score SPF_PASS          -0.001
score SPF_HELO_PASS     -0.001
score SPF_FAIL          8.001
score SPF_SOFTFAIL      4.001
score SPF_HELO_FAIL     5.001
score SPF_HELO_SOFTFAIL 4.001

We tag messages as spam at a score of 6.000 or higher.  Fyi, the -0.001 scores exist only so that we can see in the message headers if email passed the SPF checks.

42,641 emails in one folder

Kevin uploaded a really cool webmail fix last night… It used to be that if I had a more than a few thousand messages in a folder, I’d get an ugly IMAP error when I try to open that folder due to the size of the IMAP operations it was doing.  The fix made this process more efficient.

I just loaded one of my IMAP folders that has 42,641 emails in it and it came up in 4 seconds!  Way to go Kevin and Steve!

This fix is only in the Neptune version of webmail, which hasn’t been released to everyone yet – but if you don’t have it, its coming.

From BIND to dnscache

After several years of running BIND9 on our DNS caching servers we
have finally ditch it and switch to D. J. Bernstein’s dnscache.   On an average day
our SMTP and Spam Filtering servers send 1400 queries per second to
each of our DNS servers during peak hours.  We made the switch because as we’ve grown we have seen more reliability, performance and general weirdness
issues with BIND.

Most notably, when the BIND cache would reach about 250 MB, its
performance deteriorated noticeably.  It would respond slowly and even
drop queries.  I have heard this is caused by BIND’s internal data
structures not efficiently getting rid of old cache records.  Instead
BIND tries to cache every record until it expires and when it does
reach some internally calculated limit, BIND starts to discard new
cache records instead of old records.  This causes the server’s
performance to take a nose dive, and causes our pagers to go off…. Time to
run "service named stop; sleep 2; service named start" again!

Also, BIND didn’t efficiently cache records from our rbldnsd servers
behind it.  We could never really figure out why so many requests were
reaching rbldnsd and not hitting the BIND cache.  Now with dnscache, we
have a good view of exactly what it is doing and have fine tuned the
SOAs in rbldnsd so that dnscache caches our spam DB lookups exactly how
we want it.  No more weirdness going on behind the scenes.

Mr. Bernstein has a lot of nasty things to say about BIND.  Don’t believe all of his hype, but do trust the fact that DJB’s code is
much simpler, more reliable and possibly more powerful than BIND.  BIND
is overkill for almost every use.  It tries to be all things for all
systems, whereas the DJB keeps things simple and provides a different server for
each purpose
.  I
like simplicity.  The install process was little bit awkward on a Linux system though,
with the daemon tools and stupid errno patch.

FYI our dnscache servers are AMD Athlon 3200s with 1GB RAM, and they
each are handling 1400 queries per second using only 15% CPU.  Currently
we have a 100 MB cache size, but we are still tuning that.