21
edits
Changes
added 2 chapters on testing
[root@ovz-node1 ~]#
</pre>
== Bofore going in production: testing, testing, testing, and ...hm... testing! ==
The installation of the cluster is finished at this point. Before putting the cluster in production it is very important to test the cluster. Because of all the possible different kinds of hardware that you may have, you may encounter problems when a failover is necessary. And as the cluster is about high availability, such problems must be found before the cluster is used for production.
Here is one example: The e1000 driver that is included in kernels < 2.6.12 has a problem when a cable gets unplugged while broadcast packets are still being sent out on that interface. When using broadcast communication in Heartbeat on a crossover link, this fills up the transmit ring buffer on the adapter (the buffer is full after about 8 minutes after the cable got unplugged). Using unicast communication in Heartbeat fixes the problem for example. Details see: http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=699#c22
Without testing you may not be aware of such problems and may face them when the cluster is in production and a failover would be necessary. So test your cluster carefully!
Possible tests can include:
* power outage test of active node
* power outage test of passive node
* network connection outage test of eth0 of active node
* network connection outage test of eth0 of passive node
* network connection outage test of crossover network connection
* ...
As mentioned above, some problems only arise after an outage lasts longer than some minutes. So do the tests also with a duration of >1h for example.
Before you start to test, build a test plan. Some valueable information on that can be found in chapter 3 "Testing a highly available Tivoli Storage Manager cluster environment" of the Redbook ''IBM Tivoli Storage Manager in a Clustered Environment''<ref>http://www.redbooks.ibm.com/abstracts/sg246679.html</ref>. In this chapter it is mentioned that the experience of the authoring team is that the testing phase must be at least two times the total implementation time for the cluster.
== Bofore installing kernel updates: testing again ==
New OpenVZ kernel often include driver updates. This kernel for examples includes an update of the e1000 module: http://openvz.org/news/updates/kernel-022stab078.21
To avoid to overlook problems with new components (such as a newer kernel), it is necessary to re-do the tests mentioned above. But as the cluster is already in production, a second cluster (test cluster) with the same hardware as the main cluster is needed. Use this test cluster to test updates of the kernel or main OS updates for the hardware node before putting them on the production cluster.
I know this is not an easy task, as it is time-consuming and needs additional hardware only for testing. But when really business-critical applications are running on the cluster, it is very good to now that the cluster works fine also with new updates installed on the hardware node. In many cases a dedicated test cluster and the time efford for the testing of updates may cause too much costs. If you cannot do such test of updates, keep in mind that over time (when you must install security updates of the OS or the kernel) you have a cluster that you have not tested in this configuration.
If you need a tested cluster (also with tested kernel updates), you may take a look on this Virtuozzo cluster: http://www.thomas-krenn.com/cluster
== How to do OpenVZ kernel updates when it contains a new DRBD version ==
As mentioned above, it is important to use the correct version of the DRBD userspace tools. When an OpenVZ kernel contains a new DRBD version, it is important that the DRBD API version of the userspace tools matches the API version of the DRBD module that is included in the OpenVZ kernel. The API versions can be found at http://svn.drbd.org/drbd/branches/drbd-0.7/ChangeLog. The best way is to always use the version of the DRBD userspace tools that matches the version of the DRBD module that is included in the OpenVZ kernel.