Editing HA cluster with DRBD and Heartbeat

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 1: Line 1:
This article shows how to setup a OpenVZ high availability (HA) cluster using the data replication software DRBD and the cluster manager Heartbeat. In this example the two machines building the cluster run on CentOS 4.3. The article also shows how to do kernel updates in the cluster, including necessary steps like recompiling of new DRBD userspace tools. For this purpose, kernel 2.6.8-022stab078.10 (containing DRBD module 0.7.17) is used as initial kernel version, and kernel 2.6.8-022stab078.14 (containing DRBD module 0.7.20) as updated kernel version.
+
This article shows how to setup a OpenVZ high availability (HA) cluster using the data replication software DRBD and the cluster manager Heartbeat. In this example the two machines builing the cluster run on CentOS 4.3. The article also shows how to do kernel updates in the cluster, including necessary steps like recompiling of new DRBD userspace tools. For this purpose, kernel 2.6.8-022stab078.10 (containing DRBD module 0.7.17) is used as initial kernel version, and kernel 2.6.8-022stab078.14 (containing DRBD module 0.7.20) as updated kernel version.
  
<b>Update:</b> this howto currently does not describe details on OpenVZ Kernel 2.6.18, which contains DRBD version 8.*. Meanwhile, some hints on using OpenVZ Kernel 2.6.18 with DRBD 8 can be found in [http://forum.openvz.org/index.php?t=msg&th=3213&start=0& this thread in the forum].
+
Additional information about clustering of virtual machines can be found in the following paper: http://www.linuxtag.org/2006/fileadmin/linuxtag/dvd/12080-paper.pdf
 
 
Additional information about clustering of virtual machines can be found in the following paper: [http://www.linuxtag.org/2006/fileadmin/linuxtag/dvd/12080-paper.pdf (PDF, 145K)]
 
 
 
Some other additional information can be found in the documentation of the Thomas-Krenn.AG cluster (The author of this howto is working in the cluster development there, that is the reason why he was able to write this howto :-). The full documentation with interesting illustrations is currently only [http://www.thomas-krenn.com/en/service-support/knowledge-center/cluster/documentation.html available in German]:
 
 
 
An excellent presentation and overview by Werner Fischer, Thomas-Krenn.AG is available here http://www.profoss.eu/index.php/main/content/download/355/3864/file/werner-fischer.pdf.
 
  
 +
Some other additional information can be found in the documentation of the Thomas-Krenn.AG cluster (The author of this howto is working in the cluster development there, that is the reason why he was able to write this howto :-). The full documentation with interesting illustrations is currently only available in German:
 +
http://my.thomas-krenn.com/service_support/index.php/page.242
  
 
== Prerequisites ==
 
== Prerequisites ==
 
The OpenVZ kernel already includes the DRBD module. The DRBD userspace tools and the cluster manager Heartbeat must be provided seperately. As the API version of the DRBD userspace tools must exactly match the API version of the module, compile them yourself. Also compile Heartbeat yourself, as at the time of this writing the CentOS extras repository only contained an old CVS version of Heartbeat.
 
The OpenVZ kernel already includes the DRBD module. The DRBD userspace tools and the cluster manager Heartbeat must be provided seperately. As the API version of the DRBD userspace tools must exactly match the API version of the module, compile them yourself. Also compile Heartbeat yourself, as at the time of this writing the CentOS extras repository only contained an old CVS version of Heartbeat.
  
On a hardware node for production use there should not be any application that is not really needed for running OpenVZ (any things which are not needed by OpenVZ should run in a VE for security reasons). As a result, compile DRBD and Heartbeat on another machine running CentOS 4.3 (in this example I used a virtual machine on a VMware Server).
+
On a hardware node for production use there should not be any applications that are not really needed for running OpenVZ (any things which are not needed by OpenVZ should run in a VE for security reasons). As a result, compile DRBD and Heartbeat on another machine running CentOS 4.3 (in this example I used a virtual machine on a VMware Server).
  
 
=== Compiling Heartbeat ===
 
=== Compiling Heartbeat ===
Line 85: Line 81:
 
! other install options
 
! other install options
 
| no firewall, no SELinux
 
| no firewall, no SELinux
| no firewall, no SELinuxt vim-enhanced
+
| no firewall, no SELinux
 +
|-
 +
! package groups
 +
| deactivated everything, only kept vim-enhanced
 +
| deactivated everything, only kept vim-enhanced
 
|}
 
|}
  
Line 92: Line 92:
 
Get the OpenVZ kernel and utilities and install them on both nodes, as described in [[quick installation]]. Update grub configuration to use the OpenVZ kernel by default. Disable starting of OpenVZ on system boot on both nodes (OpenVZ will be started and stopped by Heartbeat):
 
Get the OpenVZ kernel and utilities and install them on both nodes, as described in [[quick installation]]. Update grub configuration to use the OpenVZ kernel by default. Disable starting of OpenVZ on system boot on both nodes (OpenVZ will be started and stopped by Heartbeat):
 
<pre>
 
<pre>
[root@ovz-node1 ~]# chkconfig vz off
+
[root@ovz-node1 ~]# chkconfig --del vz
 
[root@ovz-node1 ~]#  
 
[root@ovz-node1 ~]#  
 
</pre>
 
</pre>
Line 99: Line 99:
 
== Setting up DRBD ==
 
== Setting up DRBD ==
  
'''On each of the two nodes create a partition that acts as underlying DRBD device.''' The partitions should have exactly the same size (I created a 10 GB partition hda3 using fdisk on each node for this example). Note that it might be necessary to reboot the machines to re-read the partition table.
+
On each of the two nodes create a partition that acts as underlying DRBD device. The partitions should have exectly the same size (I created a 10 GB partition hda3 using fdisk on each node for this example). Note that it might be necessary to reboot the machines to re-read the partition table.
  
 
Install the rpm of the DRBD userspace tools on both nodes:
 
Install the rpm of the DRBD userspace tools on both nodes:
Line 213: Line 213:
 
[root@ovz-node1 ~]# mkdir -p /vz/cluster/etc/sysconfig
 
[root@ovz-node1 ~]# mkdir -p /vz/cluster/etc/sysconfig
 
[root@ovz-node1 ~]# mkdir -p /vz/cluster/var
 
[root@ovz-node1 ~]# mkdir -p /vz/cluster/var
[root@ovz-node1 ~]# cp -a /etc/vz.orig /vz/cluster/etc/vz/
+
[root@ovz-node1 ~]# cp -a /etc/vz /vz/cluster/etc/
[root@ovz-node1 ~]# cp -a /etc/sysconfig/vz-scripts.orig /vz/cluster/etc/sysconfig/vz-scripts
+
[root@ovz-node1 ~]# cp -a /etc/sysconfig/vz-scripts /vz/cluster/etc/sysconfig/
[root@ovz-node1 ~]# cp -a /var/vzquota.orig /vz/cluster/var/vzquota
+
[root@ovz-node1 ~]# cp -a /var/vzquota /vz/cluster/var/
 
[root@ovz-node1 ~]# umount /dev/drbd0
 
[root@ovz-node1 ~]# umount /dev/drbd0
 
[root@ovz-node1 ~]#
 
[root@ovz-node1 ~]#
Line 231: Line 231:
 
[root@ovz-node1 ~]#
 
[root@ovz-node1 ~]#
 
</pre>
 
</pre>
Create the Heartbeat configuration file ha.cf and copy it to <code>/etc/ha.d/ha.cf</code> on both nodes. Details about this file can be found at http://www.linux-ha.org/ha.cf. Below is an example configuration which uses the two network connections and also a serial connection for heartbeat packets:
+
Create the Heartbeat configuration file ha.cf and copy it to /etc/ha.d/ha.cf on both nodes. Details about this file can be found at http://www.linux-ha.org/ha.cf. Below is an example configuration which uses the two network connections and also a serial connection for heartbeat packets:
 
<pre>
 
<pre>
 
# Heartbeat logging configuration
 
# Heartbeat logging configuration
Line 262: Line 262:
 
respawn hacluster /usr/lib64/heartbeat/ipfail
 
respawn hacluster /usr/lib64/heartbeat/ipfail
 
</pre>
 
</pre>
Create the Heartbeat configuration file authkeys and copy it to <code>/etc/ha.d/authkeys</code> on both nodes. Set the permissions of this file to 600. Details about this file can be found at http://www.linux-ha.org/authkeys. Below is an example:
+
Create the Heartbeat configuration file authkeys and copy it to /etc/ha.d/authkeys on both nodes. Set the permissions of this file to 600. Details about this file can be found at http://www.linux-ha.org/authkeys. Below is an example:
 
<pre>
 
<pre>
 
auth 1
 
auth 1
 
1 sha1 PutYourSuperSecretKeyHere
 
1 sha1 PutYourSuperSecretKeyHere
 
</pre>
 
</pre>
Create the Heartbeat configuration file haresources and copy it to <code>/etc/ha.d/haresources</code> on both nodes. Details about this file can be found at http://www.linux-ha.org/haresources. Note that it is not necessary to configure IPs for gratuitous arp here. The gratuitous arp is done by OpenVZ itself, through <code>/etc/sysconfig/network-scripts/ifup-venet</code> and <code>/usr/lib/vzctl/scripts/vps-functions</code>. Below is an example for the haresources file:
+
Create the Heartbeat configuration file haresources and copy it to /etc/ha.d/haresources on both nodes. Details about this file can be found at http://www.linux-ha.org/haresources. Below is an example:
 
<pre>
 
<pre>
ovz-node1 drbddisk::r0 Filesystem::/dev/drbd0::/vz::ext3 vz MailTo::youremail@yourdomain.tld
+
ovz-node1 datadisk::r0 Filesystem::/dev/drbd0::/vz::ext3 vz MailTo::youremail@yourdomain.tld
 
</pre>
 
</pre>
 
Finally, you can now start heartbeat on both nodes:
 
Finally, you can now start heartbeat on both nodes:
Line 279: Line 279:
 
</pre>
 
</pre>
  
== Before going in production: testing, testing, testing, and ...hm... testing! ==
+
== How to do OpenVZ kernel updates when it contains a new DRBD version ==
 
 
The installation of the cluster is finished at this point. Before putting the cluster in production it is very important to test the cluster. Because of all the possible different kinds of hardware that you may have, you may encounter problems when a failover is necessary. And as the cluster is about high availability, such problems must be found before the cluster is used for production.
 
 
 
Here is one example: The e1000 driver that is included in kernels < 2.6.12 has a problem when a cable gets unplugged while broadcast packets are still being sent out on that interface. When using broadcast communication in Heartbeat on a crossover link, this fills up the transmit ring buffer on the adapter (the buffer is full after about 8 minutes after the cable got unplugged). Using unicast communication in Heartbeat fixes the problem for example. Details see: http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=699#c22
 
 
 
Without testing you may not be aware of such problems and may face them when the cluster is in production and a failover would be necessary. So test your cluster carefully!
 
 
 
Possible tests can include:
 
* power outage test of active node
 
* power outage test of passive node
 
* network connection outage test of eth0 of active node
 
* network connection outage test of eth0 of passive node
 
* network connection outage test of crossover network connection
 
* ...
 
 
 
As mentioned above, some problems only arise after an outage lasts longer than some minutes. So do the tests also with a duration of >1h for example.
 
 
 
Before you start to test, build a test plan. Some valueable information on that can be found in chapter 3 "Testing a highly available Tivoli Storage Manager cluster environment" of the Redbook ''IBM Tivoli Storage Manager in a Clustered Environment'', see http://www.redbooks.ibm.com/abstracts/sg246679.html. In this chapter it is mentioned that the experience of the authoring team is that the testing phase must be at least two times the total implementation time for the cluster.
 
 
 
== Before installing kernel updates: testing again ==
 
 
 
New OpenVZ kernel often include driver updates. This kernel for examples includes an update of the e1000 module: http://openvz.org/news/updates/kernel-022stab078.21
 
 
 
To avoid to overlook problems with new components (such as a newer kernel), it is necessary to re-do the tests mentioned above. But as the cluster is already in production, a second cluster (test cluster) with the same hardware as the main cluster is needed. Use this test cluster to test updates of the kernel or main OS updates for the hardware node before putting them on the production cluster.
 
 
 
I know this is not an easy task, as it is time-consuming and needs additional hardware only for testing. But when really business-critical applications are running on the cluster, it is very good to now that the cluster works fine also with new updates installed on the hardware node. In many cases a dedicated test cluster and the time efford for the testing of updates may cause too much costs. If you cannot do such test of updates, keep in mind that over time (when you must install security updates of the OS or the kernel) you have a cluster that you have not tested in this configuration.
 
 
 
If you need a tested cluster (also with tested kernel updates), you may take a look on this Virtuozzo cluster: http://www.thomas-krenn.com/cluster
 
  
== How to do OpenVZ kernel updates when it contains a new DRBD version ==
 
  
 
As mentioned above, it is important to use the correct version of the DRBD userspace tools. When an OpenVZ kernel contains a new DRBD version, it is important that the DRBD API version of the userspace tools matches the API version of the DRBD module that is included in the OpenVZ kernel. The API versions can be found at http://svn.drbd.org/drbd/branches/drbd-0.7/ChangeLog. The best way is to always use the version of the DRBD userspace tools that matches the version of the DRBD module that is included in the OpenVZ kernel.
 
As mentioned above, it is important to use the correct version of the DRBD userspace tools. When an OpenVZ kernel contains a new DRBD version, it is important that the DRBD API version of the userspace tools matches the API version of the DRBD module that is included in the OpenVZ kernel. The API versions can be found at http://svn.drbd.org/drbd/branches/drbd-0.7/ChangeLog. The best way is to always use the version of the DRBD userspace tools that matches the version of the DRBD module that is included in the OpenVZ kernel.
Line 373: Line 344:
 
Ensure after every update of OpenVZ tools that OpenVZ is not started on system boot. To disable starting of OpenVZ on system boot execute on both nodes:
 
Ensure after every update of OpenVZ tools that OpenVZ is not started on system boot. To disable starting of OpenVZ on system boot execute on both nodes:
 
<pre>
 
<pre>
[root@ovz-node1 ~]# chkconfig vz off
+
[root@ovz-node1 ~]# chkconfig --del vz
 
[root@ovz-node1 ~]#  
 
[root@ovz-node1 ~]#  
</pre>
 
 
== Live-Switchover with the help of checkpointing ==
 
 
With the help of [[Checkpointing_and_live_migration|checkpointing]] it is possible to do live switchovers.
 
 
<b>Important:</b> although this HOWTO currently describes the use of DRBD 0.7, it is necessary to use DRBD 8 to be able to use this live-switchover feature reliable. Some hints on using OpenVZ Kernel 2.6.18 with DRBD 8 can be found in [http://forum.openvz.org/index.php?t=msg&th=3213&start=0& this thread in the forum].
 
 
The following scripts are written by Thomas Kappelmueller. They should be placed at /root/live-switchover/ on both nodes. To activate the scripts execute the following commands on both nodes:
 
<pre>
 
[root@ovz-node1 ~]# ln -s /root/live-switchover/openvz /etc/init.d/
 
[root@ovz-node1 ~]# ln -s /root/live-switchover/live_switchover.sh /root/bin/
 
[root@ovz-node1 ~]#
 
</pre>
 
 
It is also necessary to replace <code>vz</code> by an adjusted initscript (<code>openvz</code> in this example). So /etc/ha.d/haresources has the following content on both nodes:
 
<pre>
 
ovz-node1 drbddisk::r0 Filesystem::/dev/drbd0::/vz::ext3 openvz MailTo::youremail@yourdomain.tld
 
</pre>
 
 
=== Script cluster_freeze.sh ===
 
<pre>
 
#!/bin/bash
 
#Script by Thomas Kappelmueller
 
#Version 1.0
 
LIVESWITCH_PATH='/vz/cluster/liveswitch'
 
 
if [ -f $LIVESWITCH_PATH ]
 
then
 
        rm -f $LIVESWITCH_PATH
 
fi
 
 
RUNNING_VE=$(vzlist -1)
 
 
for I in $RUNNING_VE
 
do
 
        BOOTLINE=$(cat /etc/sysconfig/vz-scripts/$I.conf | grep -i "^onboot")
 
        if [ $I != 1 -a "$BOOTLINE" = "ONBOOT=\"yes\"" ]
 
        then
 
                vzctl chkpnt $I
 
 
                if [ $? -eq 0 ]
 
                then
 
                        vzctl set $I --onboot no --save
 
                        echo $I >> $LIVESWITCH_PATH
 
                fi
 
        fi
 
done
 
 
exit 0
 
</pre>
 
 
=== Script cluster_unfreeze.sh ===
 
<pre>
 
#!/bin/bash
 
#Script by Thomas Kappelmueller
 
#Version 1.0
 
 
LIVESWITCH_PATH='/vz/cluster/liveswitch'
 
 
if [ -f $LIVESWITCH_PATH ]
 
then
 
        FROZEN_VE=$(cat $LIVESWITCH_PATH)
 
else
 
        exit 1
 
fi
 
 
for I in $FROZEN_VE
 
do
 
        vzctl restore $I
 
 
        if [ $? != 0 ]
 
        then
 
                vzctl start $I
 
        fi
 
 
        vzctl set $I --onboot yes --save
 
done
 
 
rm -f $LIVESWITCH_PATH
 
 
exit 0
 
</pre>
 
 
=== Script live_switchover.sh ===
 
<pre>
 
#!/bin/bash
 
#Script by Thomas Kappelmueller
 
#Version 1.0
 
 
ps -eaf | grep 'vzctl enter' | grep -v 'grep' > /dev/null
 
if [ $? -eq 0 ]
 
then
 
  echo 'vzctl enter is active. please finish before live switchover.'
 
  exit 1
 
fi
 
ps -eaf | grep 'vzctl exec' | grep -v 'grep' > /dev/null
 
if [ $? -eq 0 ]
 
then
 
  echo 'vzctl exec is active. please finish before live switchover.'
 
  exit 1
 
fi
 
echo "Freezing VEs..."
 
/root/live-switchover/cluster_freeze.sh
 
echo "Starting Switchover..."
 
/usr/lib64/heartbeat/hb_standby
 
</pre>
 
 
=== Script openvz ===
 
<pre>
 
#!/bin/bash
 
#
 
# openvz        Startup script for OpenVZ
 
#
 
 
start() {
 
        /etc/init.d/vz start > /dev/null 2>&1
 
        RETVAL=$?
 
        /root/live-switchover/cluster_unfreeze.sh
 
        return $RETVAL
 
}
 
stop() {
 
        /etc/init.d/vz stop > /dev/null 2>&1
 
        RETVAL=$?
 
        return $RETVAL
 
}
 
status() {
 
        /etc/init.d/vz status > /dev/null 2>&1
 
        RETVAL=$?
 
        return $RETVAL
 
}
 
 
# See how we were called.
 
case "$1" in
 
  start)
 
        start
 
        ;;
 
  stop)
 
        stop
 
        ;;
 
  status)
 
        status
 
        ;;
 
  *)
 
        echo $"Usage: openvz {start|stop|status}"
 
        exit 1
 
esac
 
 
exit $RETVAL
 
 
</pre>
 
</pre>
  
 
[[Category: HOWTO]]
 
[[Category: HOWTO]]

Please note that all contributions to OpenVZ Virtuozzo Containers Wiki may be edited, altered, or removed by other contributors. If you don't want your writing to be edited mercilessly, then don't submit it here.
If you are going to add external links to an article, read the External links policy first!

To edit this page, please answer the question that appears below (more info):

Cancel Editing help (opens in new window)