A Rackspace engineer asked me for some info on DRBD the other day. We are heavy DRBD users. Below is a summary of what I told him. If you find this information useful, comment below…
We use DRBD to mirror our mail data between pairs of servers. You can mirror any file system on top of DRBD. We choose to use ReiserFS because it is optimal for handling large numbers of small files, which our maildir directory structures contain. Any given server of ours has between 1 million and 2 million files.
On each pair of servers we have two separate DRBD mirrors going in opposite direction. Only one server can mount a DRBD partition at a time. The secondary server cannot even mount the DRBD partition as read-only while the primary has it mounted. On “a” servers /dev/sda is mounted as primary and mirrored to /dev/sda on the “b” server. On “b” servers /dev/sdb is mounted as primary and mirrored to /dev/sdb on the “a” server.
We use heartbeat to manage a two virtual IP addresses, one owned by “a” and one owned by “b”, and also to managed the DRBD primary/secondary status and mounts. When heartbeat detects a failure, the remaining good server takes over the dead server’s virtual IP address and DRBD mount, and is capable of serving mail access for the user’s who’s data was on the failed server.
DRBD lessons learned:
– Create a dedicated partition for DRBD’s “meta-disk” and put it on a separate drive from the DRBD partitions themselves (we allocate 1 GB). This should improve performance in theory, and at a minimum ensures that space is always dedicated to the meta-disk. At first we did not allocate any space to meta-disks and DRBD defaulted to using the last 128 MB of it’s disk. We believe that a full disk led to data corruption when running this configuration in at least one instance.
– Change the incon-degr-cmd setting to: “echo ‘!DRBD! pri on incon-degr’ | wall ; exit 1” …. by default it halts your system when DRBD thinks it is in a degraded state. Since our servers are at Rackspace and we have no console access, halting is a major annoyance.
– When server goes offline and then recovers, DRBD attempts to automatically reconnect the two and resync the data. We have seen several cases where DRBD makes an incorrect decision for which server is primary, and the data sync occurs in the wrong direction — losing new data. The DRBD authors have fixed several bugs related to this, but even with version 0.7.21 we have still seen this occur. To work around this, we have configured heartbeat to handle failover, but not failback. It requires manual intervention to reconnect the two servers and get them syncing. As long as the engineer knows that they are doing, they can get it syncing correctly.
– DRBD is complicated software. Try to keep everything else around it simple in order to quickly troubleshoot problems. For example we used to run a local RAID 0 underneath DRBD in order to gain an I/O boost. Don’t. It is better to just run another instanced of DRBD on the additional disk and partition your data between the independent DRBD mirrors.
– Again DRBD is complicated. If there are simpler alternatives, I would recommend exploring them. For instance a simple rsync script will do great in many situations. And csync2 is a good choice for multi-server synchronization of a relatively small number of files. Both are easy to troubleshoot when things break because they run on top of any normal file system, whereas DRBD runs underneath the file system. It is difficult to troubleshoot and fix problems with software that runs underneath the file system.
Thanks for this info, we are considering running drbd
Hi Bill,
Thank you for the informative post.
I am FreeBSD fan, so DRBD isn’t for me but I found geom ggate[cd] which can do the same.
But I wonder why did you go with network block device replication instead of using SAN, any specific advantages?
SAN was an option that we looked at. Although, we found that we could get the cost per GB lower by using lots of commodity SATA drives, if we could come up with a way to use this failure-prone hardware in a way that makes the overall system reliable. DRBD + heartbeat helped us achieve that; along with other software that we wrote to manage lots of pairs of DRBD boxes. So the answer is cost.
Hi,
I am actually using DRBD for my cluster replication. but DRBD is only Sync when i am starting it. and then stays idle.. nothing happend.. even though i have created loads of files on in the partitions..
The following is my DRBD.CONF..
Please advise
global {
# we want to be able to use up to 2 drbd devices
minor-count 2;
dialog-refresh 1; # 5 seconds
}
resource r0 {
protocol C;
incon-degr-cmd “echo ‘!DRBD! pri on incon-degr’ | wall ; sleep 2 ; halt -f”;
on drbd1 {
device /dev/drbd0;
disk /dev/sda2;
address 192.168.1.69:7788;
meta-disk internal;
}
on drbd2 {
device /dev/drbd0;
disk /dev/sda2;
address 192.168.1.73:7788;
meta-disk internal;
}
disk {
on-io-error detach;
}
net {
max-buffers 2048;
ko-count 4;
on-disconnect reconnect;
}
syncer {
rate 10M;
group 1;
al-extents 257; # must be a prime number
}
startup {
wfc-timeout 0;
degr-wfc-timeout 10; # 2 minutes.
}
}
For troubleshooting DRBD, your best bet is to pose this question on the DRBD mailing list. There are lots of smart people there willing to help… http://lists.linbit.com/listinfo/drbd-user
How do you manage backups ? To tape ? Using snapshots ?
We are currently using drbd in a similar way for a high volume dating site, works very well !
We built our backup system using Amazon S3, which allows us to do very customized things such as build a GUI for our customers to do their own restores, and decide which files to back up based on something other than file path:
http://billboebel.typepad.com/blog/2007/05/my_amazon_s3_sl.html
HI Bill
Do you mind posting your drbd config file? If it’s a security risk or whatever I understand but it would be helpful to me.
I can’t do that, but somebody on the DRBD mailing list might be able to. In fact I bet there are several posted to the list archives.
I’m new to DRBD, I’m
wondering if it can be used
for synchronous mirroring
at the directory level
e.g. building a HA NFS
config. where the active
node can fail over to one
of multiple passive nodes
depending, specifically if
the performance is such
that itallows synchronous
mirroring and if
directory-level mirroring
granularity is allowed
Saqib: you can replicate different directories to different servers by setting up a partition for each and creating a dedicated DRBD resource instance for each.