Load Balancing vs Failover

This week we added a pair of load balancers to our email hosting system to balance our internal system traffic such as DNS, LDAP MySQL, and a few other services.  Until now we had only used load balancers to balance our external customer traffic (SMTP, POP3, IMAP, Webmail).

Up until now we had our internal services hosted on many independent pairs of servers with master-master failover via heartbeat.  With heartbeat, when there is a problem with one server in a pair, heartbeat tells its partner to take over all traffic for the pair until the problem is resolved.  Master-master pairs can be tricky however, because it is difficult to catch and handle all possible failure conditions.  Plus when a member of a pair dies, fixing it is extremely time sensitive since your service will be down if the other member dies before you restore the redundancy.

It is also messy to scale services that are configured for master-master failover because your application servers need to be configured to each query different database servers, DNS servers etc.  It would be a lot cleaner if the applications could simply be told to "query DNS" and something external to the application server routes the query to the right place.  That is where load balancing comes in.

Now all of our application servers have identical configurations.  They all query the same DNS server, the same LDAP server, and the same MySQL server.  However the "server" that they are configured to query is actually a pair of load balancers which redirect the query to one of the real servers behind it.  When there is a problem with one of the real servers, the load balancers simply stop sending queries to the bad server.  And to scale, we simply add a new real server and tell the load balancer about it.

Our load balancers are custom-built AMD Opteron Dual Core servers with gigabit NICs.  They run keepalived, which uses IPVS for load balancing and VRRP for making the load balancers themselves redundant.

Keepalived out of the box only supports load balancing TCP traffic though, so we had to hack it to do UDP for DNS.  I’ll save that for another post.

Leave a Reply

Your email address will not be published. Required fields are marked *