We put some additional spam filtering servers online yesterday, and we figured we’d bump up to Red Hat ES4 (Linux 2.6.9-patched). We had been running Red Hat ES3 (Linux 2.4.21-patched) for all of our prior spam filtering servers. But the performance on these new ES4 servers sucked under a high load.
At first I noticed from vmstat that it was swapping a bunch of application memory when it didn’t need to, but I fixed that by: echo 0 > /proc/sys/vm/swappiness
It also seemed that it was swapping our tmpfs partitions when there was plenty of memory, which ES3 didn’t do. But even when turning swapoff entirely and using less than 50% of our RAM, once there were over 100 Postfix smtpd processes receiving mail, the server’s load average sky-rocketed and the server became unresponsive. Our ES3 servers handle the same load all day long with no problem.
I also noticed from iostat that the ES4 servers did a lot of reads from our root partition, which our ES3 servers do not do. I don’t know why it would do that because we use the exact same drive and config and software versions on both machines. And plus, Postfix and amavisd/spamassassin are supposed to keep all their config data in memory.
The strangeness got stranger when I stopped either amavisd or Postfix. With Postfix accepting incoming mail on port 25 and amavisd stopped, the reads on the root partion vanish. With Postfix’s port 25 blocked (no incoming mail) and amavisd crunching on queued mail, the reads on the root partition vanish. However with both running, there are heavy reads on the root partition… WTF!
Anyways, after spending more than a day on it I just wanted to let you know that ES4 sucks and we have rebuilt these machines using our trusty ES3 image and they are now operating great.