HA cluster with DRBD and Heartbeat

From OpenVZ Virtuozzo Containers Wiki
Revision as of 10:20, 22 September 2006 by Wfischer (talk | contribs) (fixed two typos)
Jump to: navigation, search

This article shows how to setup a OpenVZ high availability (HA) cluster using the data replication software DRBD and the cluster manager Heartbeat. In this example the two machines builing the cluster run on CentOS 4.3. The article also shows how to do kernel updates in the cluster, including necessary steps like recompiling of new DRBD userspace tools. For this purpose, kernel 2.6.8-022stab078.10 (containing DRBD module 0.7.17) is used as initial kernel version, and kernel 2.6.8-022stab078.14 (containing DRBD module 0.7.20) as updated kernel version.

Additional information about clustering of virtual machines can be found in the following paper: http://www.linuxtag.org/2006/fileadmin/linuxtag/dvd/12080-paper.pdf

Some other additional information can be found in the documentation of the Thomas-Krenn.AG cluster (The author of this howto is working in the cluster development there, that is the reason why he was able to write this howto :-). The full documentation with interesting illustrations is currently only available in German: http://my.thomas-krenn.com/service_support/index.php/page.242

Prerequisites

The OpenVZ kernel already includes the DRBD module. The DRBD userspace tools and the cluster manager Heartbeat must be provided seperately. As the API version of the DRBD userspace tools must exactly match the API version of the module, compile them yourself. Also compile Heartbeat yourself, as at the time of this writing the CentOS extras repository only contained an old CVS version of Heartbeat.

On a hardware node for production use there should not be any application that is not really needed for running OpenVZ (any things which are not needed by OpenVZ should run in a VE for security reasons). As a result, compile DRBD and Heartbeat on another machine running CentOS 4.3 (in this example I used a virtual machine on a VMware Server).

Compiling Heartbeat

Heartbeat version 1.2.* has successfully been used in a lot of two-node-clusters around the world. As the codebase used in version 1.2.* is in production use for many years now, the code is very stable. At the time of writing, Heartbeat version 1.2.4 is the current version of the 1.2.* branch.

Get the tar.gz of the current version of the 1.2.* branch from http://linux-ha.org/download/index.html, at the time of this writing this is http://linux-ha.org/download/heartbeat-1.2.4.tar.gz. Use rpmbuild to build the package:

rpmbuild -ta heartbeat-1.2.4.tar.gz

After that, you find four rpm packes in /usr/src/redhat/RPMS/i386 (heartbeat-1.2.4-1.i386.rpm, heartbeat-ldirectord-1.2.4-1.i386.rpm, heartbeat-pils-1.2.4-1.i386.rpm, heartbeat-stonith-1.2.4-1.i386.rpm). In this example only heartbeat-1.2.4-1.i386.rpm, heartbeat-pils-1.2.4-1.i386.rpm, and heartbeat-stonith-1.2.4-1.i386.rpm are needed.

Compiling DRBD userspace tools

When compiling the DRBD userspace tools, you have to take care to take the version that matches the DRBD version that is included in the OpenVZ kernel you want to use. If you are unsure about the version, do the following steps while running the OpenVZ kernel that you want to use on a test machine (I used another virtual machine on a VMware server to try this):

[root@testmachine ~]# cat /proc/version
Linux version 2.6.8-022stab078.10 (root@rhel4-32) (gcc version 3.4.5 20051201 (Red Hat 3.4.5-2)) #1 Wed Jun 21 12:01:20 MSD 2006
[root@testmachine ~]# modprobe drbd
[root@testmachine ~]# cat /proc/drbd
version: 0.7.17 (api:77/proto:74)
SVN Revision: 2093 build by phil@mescal, 2006-03-06 15:04:12
 0: cs:Unconfigured
 1: cs:Unconfigured
[root@testmachine ~]# rmmod drbd
[root@testmachine ~]#

Here the version of the DRBD module is 0.7.17. So the userspace tools for 0.7.17 are neccessary.

Back on the buildmachine, do the following to create the rpm:

[root@buildmachine ~]# yum install kernel-devel gcc bison flex
Setting up Install Process
Setting up repositories
Reading repository metadata in from local files
Parsing package install arguments
Nothing to do
[root@buildmachine ~]# tar xfz drbd-0.7.17.tar.gz
[root@buildmachine ~]# cd drbd-0.7.17
[root@buildmachine drbd-0.7.17]# make rpm
[...]
You have now:
-rw-r--r--  1 root root 288728 Jul 30 10:40 dist/RPMS/i386/drbd-0.7.17-1.i386.rpm
-rw-r--r--  1 root root 518369 Jul 30 10:40 dist/RPMS/i386/drbd-km-2.6.9_34.0.2.EL-0.7.17-1.i386.rpm
[root@buildmachine drbd-0.7.17]#

Note that in this way the kernel-devel from CentOS is used, but this does not matter as the created drbd-km rpm will not be used (the DRBD kernel module is already included in OpenVZ kernel). If the kernel-devel package is not the same version as the kernel package that is currently running, it is possible to execute 'make rpm KDIR=/usr/src/kernels/2.6.9-34.0.2.EL-i686/' to directly point to the kernel sources.

Installing the two nodes

Install the two machines in the same way as you would install them for a normal OpenVZ installation, but do not create a filesystem for the /vz. This filesystem will be installed later on on top of DRBD.

Example installation configuration
Parameter node1 node2
hostname ovz-node1 ovz-node2
/ filesystem hda1, 10 GB hda1, 10 GB
swap space hda2, 2048 MB hda2, 2048 MB
public LAN eth0, 192.168.1.201 eth0, 192.168.1.202
private LAN eth1, 192.168.255.1 (Gbit Ethernet) eth1, 192.168.255.2 (Gbit Ethernet)
other install options no firewall, no SELinux no firewall, no SELinux
package groups deactivated everything, only kept vim-enhanced deactivated everything, only kept vim-enhanced

Installing OpenVZ

Get the OpenVZ kernel and utilities and install them on both nodes, as described in quick installation. Update grub configuration to use the OpenVZ kernel by default. Disable starting of OpenVZ on system boot on both nodes (OpenVZ will be started and stopped by Heartbeat):

[root@ovz-node1 ~]# chkconfig --del vz
[root@ovz-node1 ~]# 

Then reboot both machines.

Setting up DRBD

On each of the two nodes create a partition that acts as underlying DRBD device. The partitions should have exactly the same size (I created a 10 GB partition hda3 using fdisk on each node for this example). Note that it might be necessary to reboot the machines to re-read the partition table.

Install the rpm of the DRBD userspace tools on both nodes:

[root@ovz-node1 ~]# rpm -ihv drbd-0.7.17-1.i386.rpm
Preparing...                ########################################### [100%]
   1:drbd                   ########################################### [100%]
[root@ovz-node1 ~]#

Then create the drbd.conf configuration file and copy it to /etc/drbd.conf on both nodes. Below is the example configuration file that is used in this article:

resource r0 {
  protocol C;
  incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f";

  startup {
    degr-wfc-timeout 120;
  }

  net {
    on-disconnect reconnect;
  }

  disk {
    on-io-error   detach;
  }

  syncer {
    rate 30M;
    group 1;
    al-extents 257;
  }

  on ovz-node1 {
    device     /dev/drbd0;
    disk       /dev/hda3;
    address    192.168.255.1:7788;
    meta-disk  internal;
  }

  on ovz-node2 {
    device     /dev/drbd0;
    disk       /dev/hda3;
    address    192.168.255.2:7788;
    meta-disk  internal;
  }

}

Start DRBD on both nodes:

[root@ovz-node1 ~]# /etc/init.d/drbd start
Starting DRBD resources:    [ d0 s0 n0 ].
[root@ovz-node1 ~]# 

Then check the status of /proc/drbd:

[root@ovz-node1 ~]# cat /proc/drbd
version: 0.7.17 (api:77/proto:74)
SVN Revision: 2093 build by phil@mescal, 2006-03-06 15:04:12
 0: cs:Connected st:Secondary/Secondary ld:Inconsistent
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
[root@ovz-node1 ~]#

Both nodes are now Secondary and Inconsistent. The latter is because the underlying storage is not yet in-sync, and DRBD has no way to know whether you want the initial sync from ovz-node1 to ovz-node2, or ovz-node2 to ovz-node1. As there is no data below it yet, it does not matter.

To start the sync from ovz-node1 to ovz-node2, do the following on ovz-node1:

[root@ovz-node1 ~]# drbdadm -- --do-what-I-say primary all
[root@ovz-node1 ~]# cat /proc/drbd
version: 0.7.17 (api:77/proto:74)
SVN Revision: 2093 build by phil@mescal, 2006-03-06 15:04:12
 0: cs:SyncSource st:Primary/Secondary ld:Consistent
    ns:627252 nr:0 dw:0 dr:629812 al:0 bm:38 lo:640 pe:0 ua:640 ap:0
        [=>..................] sync'ed:  6.6% (8805/9418)M
        finish: 0:04:51 speed: 30,888 (27,268) K/sec
[root@ovz-node1 ~]#

As you see, DRBD syncs with about 30 MB per second, as we told it so in /etc/drbd.conf. On the SyncSource (ovz-node1 in this case) the DRBD device is already useable (although it is syncing in the background).

So you can immediately create the filesystem:

[root@ovz-node1 ~]# mkfs.ext3 /dev/drbd0
[...]
[root@ovz-node1 ~]# 

Copy necessary OpenVZ files to DRBD device

Move the original /vz directory to /vz.orig and recreate the /vz directory to have it as a mount point (do this on both nodes):

[root@ovz-node1 ~]# mv /vz /vz.orig
[root@ovz-node1 ~]# mkdir /vz
[root@ovz-node1 ~]#

Afterwards move the necessary OpenVZ directories (/etc/vz, /etc/sysconfig/vz-scripts, /var/vzquota) and replace them with symbolic links (do this on both nodes):

[root@ovz-node1 ~]# mv /etc/vz /etc/vz.orig
[root@ovz-node1 ~]# mv /etc/sysconfig/vz-scripts /etc/sysconfig/vz-scripts.orig
[root@ovz-node1 ~]# mv /var/vzquota /var/vzquota.orig
[root@ovz-node1 ~]# ln -s /vz/cluster/etc/vz /etc/vz
[root@ovz-node1 ~]# ln -s /vz/cluster/etc/sysconfig/vz-scripts /etc/sysconfig/vz-scripts
[root@ovz-node1 ~]# ln -s /vz/cluster/var/vzquota /var/vzquota
[root@ovz-node1 ~]#

Currently, ovz-node1 is still Primary of /dev/drbd0. You can now mount it and copy the necessary files to it (only on ovz-node1!):

[root@ovz-node1 ~]# mount /dev/drbd0 /vz
[root@ovz-node1 ~]# cp -a /vz.orig/* /vz/
[root@ovz-node1 ~]# mkdir -p /vz/cluster/etc
[root@ovz-node1 ~]# mkdir -p /vz/cluster/etc/sysconfig
[root@ovz-node1 ~]# mkdir -p /vz/cluster/var
[root@ovz-node1 ~]# cp -a /etc/vz /vz/cluster/etc/
[root@ovz-node1 ~]# cp -a /etc/sysconfig/vz-scripts /vz/cluster/etc/sysconfig/
[root@ovz-node1 ~]# cp -a /var/vzquota /vz/cluster/var/
[root@ovz-node1 ~]# umount /dev/drbd0
[root@ovz-node1 ~]#

Setting up Heartbeat

Install the neccessary Heartbeat rpms on both nodes:

[root@ovz-node1 ~]# rpm -ihv heartbeat-1.2.4-1.i386.rpm heartbeat-pils-1.2.4-1.i386.rpm heartbeat-stonith-1.2.4-1.i386.rpm
Preparing...                ########################################### [100%]
   1:heartbeat-pils         ########################################### [ 33%]
   2:heartbeat-stonith      ########################################### [ 67%]
   3:heartbeat              ########################################### [100%]
[root@ovz-node1 ~]#

Create the Heartbeat configuration file ha.cf and copy it to /etc/ha.d/ha.cf on both nodes. Details about this file can be found at http://www.linux-ha.org/ha.cf. Below is an example configuration which uses the two network connections and also a serial connection for heartbeat packets:

# Heartbeat logging configuration
logfacility daemon

# Heartbeat cluster members
node ovz-node1
node ovz-node2

# Heartbeat communication timing
keepalive 1
warntime 10
deadtime 30
initdead 120

# Heartbeat communication paths
udpport 694
ucast eth1 192.168.255.1
ucast eth1 192.168.255.2
ucast eth0 192.168.1.201
ucast eth0 192.168.1.202
baud 19200
serial /dev/ttyS0

# Don't fail back automatically
auto_failback off

# Monitoring of network connection to default gateway
ping 192.168.1.1
respawn hacluster /usr/lib64/heartbeat/ipfail

Create the Heartbeat configuration file authkeys and copy it to /etc/ha.d/authkeys on both nodes. Set the permissions of this file to 600. Details about this file can be found at http://www.linux-ha.org/authkeys. Below is an example:

auth 1
1 sha1 PutYourSuperSecretKeyHere

Create the Heartbeat configuration file haresources and copy it to /etc/ha.d/haresources on both nodes. Details about this file can be found at http://www.linux-ha.org/haresources. Below is an example:

ovz-node1 datadisk::r0 Filesystem::/dev/drbd0::/vz::ext3 vz MailTo::youremail@yourdomain.tld

Finally, you can now start heartbeat on both nodes:

[root@ovz-node1 ~]# /etc/init.d/heartbeat start
Starting High-Availability services:
                                                           [  OK  ]
[root@ovz-node1 ~]#

How to do OpenVZ kernel updates when it contains a new DRBD version

As mentioned above, it is important to use the correct version of the DRBD userspace tools. When an OpenVZ kernel contains a new DRBD version, it is important that the DRBD API version of the userspace tools matches the API version of the DRBD module that is included in the OpenVZ kernel. The API versions can be found at http://svn.drbd.org/drbd/branches/drbd-0.7/ChangeLog. The best way is to always use the version of the DRBD userspace tools that matches the version of the DRBD module that is included in the OpenVZ kernel.

In this example the initial cluster installation contained OpenVZ kernel 2.6.8-022stab078.10, which contains the DRBD module 0.7.17. The steps below show the update procedure to OpenVZ kernel 2.6.8-022stab078.14, which contains the DRBD module 0.7.20. In the first step build the DRBD userspace tools version 0.7.20 on your buildmachine. Then stop Heartbeat and DRBD on the passive node (hint: you can use 'cat /proc/drbd' to get a hint which node is active and which one is passive):

[root@ovz-node2 ~]# cat /proc/drbd
version: 0.7.17 (api:77/proto:74)
SVN Revision: 2093 build by phil@mescal, 2006-03-06 15:04:12
 0: cs:Connected st:Secondary/Primary ld:Consistent
    ns:60 nr:136 dw:196 dr:97 al:3 bm:3 lo:0 pe:0 ua:0 ap:0
[root@ovz-node2 ~]# /etc/init.d/heartbeat stop
Stopping High-Availability services:
                                                           [  OK  ]
[root@ovz-node2 ~]# cat /proc/drbd
version: 0.7.17 (api:77/proto:74)
SVN Revision: 2093 build by phil@mescal, 2006-03-06 15:04:12
 0: cs:Connected st:Secondary/Primary ld:Consistent
    ns:60 nr:136 dw:196 dr:97 al:3 bm:3 lo:0 pe:0 ua:0 ap:0
[root@ovz-node2 ~]# /etc/init.d/drbd stop
Stopping all DRBD resources.
[root@ovz-node2 ~]# cat /proc/drbd
cat: /proc/drbd: No such file or directory
[root@ovz-node2 ~]#

Then install the new kernel and the DRBD userspace tools on this node:

[root@ovz-node2 ~]# rpm -ihv ovzkernel-2.6.8-022stab078.14.i686.rpm
warning: ovzkernel-2.6.8-022stab078.14.i686.rpm: V3 DSA signature: NOKEY, key ID a7a1d4b6
Preparing...                ########################################### [100%]
   1:ovzkernel              ########################################### [100%]
[root@ovz-node2 ~]# rpm -Uhv drbd-0.7.20-1.i386.rpm
Preparing...                ########################################### [100%]
   1:drbd                   ########################################### [100%]
/sbin/service
Stopping all DRBD resources.
[root@ovz-node2 ~]#

Now set the new kernel as default kernel in /etc/grub.conf and then reboot this node.

After the reboot, the new DRBD version is visible:

[root@ovz-node2 ~]# cat /proc/drbd
version: 0.7.20 (api:79/proto:74)
SVN Revision: 2260 build by phil@mescal, 2006-07-04 15:18:57
 0: cs:Connected st:Secondary/Primary ld:Consistent
    ns:0 nr:28 dw:28 dr:0 al:0 bm:2 lo:0 pe:0 ua:0 ap:0
[root@ovz-node2 ~]#

To update the other node, switch-over the services to make the current active node the passive node. Execute the following on the still active node (it could be that the hb_standby command is located in /usr/lib/heartbeat):

[root@ovz-node1 ~]# /usr/lib64/heartbeat/hb_standby
2006/08/03_21:09:41 Going standby [all].
[root@ovz-node1 ~]#

Now do the same steps on the new passive node to update it: stop Heartbeat and DRBD, install the new kernel and the new DRBD userspace tools, set the new kernel as default kernel in /etc/grub.conf and reboot the node.

How to do updates of vzctl, vzctl-lib, and vzquota

Ensure after every update of OpenVZ tools that OpenVZ is not started on system boot. To disable starting of OpenVZ on system boot execute on both nodes:

[root@ovz-node1 ~]# chkconfig --del vz
[root@ovz-node1 ~]#