I sold half of my Google stock this morning and bought Amazon. I really believe in the direction that Amazon is heading with their infrastructure services – S3, SQS, EC2 and whatever is next. I don’t know much about the retail side of their business, but they have a bight future ahead selling web services to a whole new market of customers. Customers like me.

And they don’t have to invest hundreds of millions of dollars to build data centers and infrastructure to support these new services. The retail side of their business already did this for them. They are just renting out the spare capacity on their existing infrastructure. The financial risks appear to be low and the upside potential is huge.

Google, Sun and MS will surely follow with similar services, but I doubt they can put much pricing pressure on Amazon. Amazon was smart and priced their services low from the start, as if competing services already existed.

Dan Ciruli was dead on in his post on Friday – Believe it or not: it’s even more important than YouTube

Companies like Webmail.us will buy web services from a mix of these sorts of new infrastructure vendors. These services will supplement systems running in our own data centers where ever it makes sense to do so. Hmm… where shall we start?

Data mirroring with DRBD

11 Replies

A Rackspace engineer asked me for some info on DRBD the other day. We are heavy DRBD users. Below is a summary of what I told him. If you find this information useful, comment below…

We use DRBD to mirror our mail data between pairs of servers. You can mirror any file system on top of DRBD. We choose to use ReiserFS because it is optimal for handling large numbers of small files, which our maildir directory structures contain. Any given server of ours has between 1 million and 2 million files.

On each pair of servers we have two separate DRBD mirrors going in opposite direction. Only one server can mount a DRBD partition at a time. The secondary server cannot even mount the DRBD partition as read-only while the primary has it mounted. On “a” servers /dev/sda is mounted as primary and mirrored to /dev/sda on the “b” server. On “b” servers /dev/sdb is mounted as primary and mirrored to /dev/sdb on the “a” server.

We use heartbeat to manage a two virtual IP addresses, one owned by “a” and one owned by “b”, and also to managed the DRBD primary/secondary status and mounts. When heartbeat detects a failure, the remaining good server takes over the dead server’s virtual IP address and DRBD mount, and is capable of serving mail access for the user’s who’s data was on the failed server.

DRBD lessons learned:

– Create a dedicated partition for DRBD’s “meta-disk” and put it on a separate drive from the DRBD partitions themselves (we allocate 1 GB). This should improve performance in theory, and at a minimum ensures that space is always dedicated to the meta-disk. At first we did not allocate any space to meta-disks and DRBD defaulted to using the last 128 MB of it’s disk. We believe that a full disk led to data corruption when running this configuration in at least one instance.

– Change the incon-degr-cmd setting to: “echo ‘!DRBD! pri on incon-degr’ | wall ; exit 1” …. by default it halts your system when DRBD thinks it is in a degraded state. Since our servers are at Rackspace and we have no console access, halting is a major annoyance.

– When server goes offline and then recovers, DRBD attempts to automatically reconnect the two and resync the data. We have seen several cases where DRBD makes an incorrect decision for which server is primary, and the data sync occurs in the wrong direction — losing new data. The DRBD authors have fixed several bugs related to this, but even with version 0.7.21 we have still seen this occur. To work around this, we have configured heartbeat to handle failover, but not failback. It requires manual intervention to reconnect the two servers and get them syncing. As long as the engineer knows that they are doing, they can get it syncing correctly.

– DRBD is complicated software. Try to keep everything else around it simple in order to quickly troubleshoot problems. For example we used to run a local RAID 0 underneath DRBD in order to gain an I/O boost. Don’t. It is better to just run another instanced of DRBD on the additional disk and partition your data between the independent DRBD mirrors.

– Again DRBD is complicated. If there are simpler alternatives, I would recommend exploring them. For instance a simple rsync script will do great in many situations. And csync2 is a good choice for multi-server synchronization of a relatively small number of files. Both are easy to troubleshoot when things break because they run on top of any normal file system, whereas DRBD runs underneath the file system. It is difficult to troubleshoot and fix problems with software that runs underneath the file system.