Difference between revisions of "Hardware testing"

From OpenVZ Virtuozzo Containers Wiki
Jump to: navigation, search
(shhh...should be machine check exception)
(Marked this version for translation)
 
(6 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 +
<translate>
 +
<!--T:1-->
 
Sometimes when you have a kernel panic, oops, [[machine check exception]], or other fatal crash, this is not programmers whom to blame, but your hardware (a classical example is {{Bug|174}}). This article describes how to properly test your hardware to check it is in a good shape.
 
Sometimes when you have a kernel panic, oops, [[machine check exception]], or other fatal crash, this is not programmers whom to blame, but your hardware (a classical example is {{Bug|174}}). This article describes how to properly test your hardware to check it is in a good shape.
  
 +
<!--T:2-->
 
Note that most of the tests described below could do harm to your machine if something is wrong with it (e.g. it is overclocked, undercooled etc). In general, overclocking is not recommended for your production server boxes.
 
Note that most of the tests described below could do harm to your machine if something is wrong with it (e.g. it is overclocked, undercooled etc). In general, overclocking is not recommended for your production server boxes.
  
== RAM tests ==
+
== RAM tests == <!--T:3-->
 
Random Access Memory (RAM) is sometimes faulty, which leads to some very strange system crashes. It is though highly recommended to test your system RAM. A several approaches and tools can be used.
 
Random Access Memory (RAM) is sometimes faulty, which leads to some very strange system crashes. It is though highly recommended to test your system RAM. A several approaches and tools can be used.
  
=== Memtest86 and Memtest86+ ===
+
=== Memtest86 and Memtest86+ === <!--T:4-->
 
[http://memtest86.com/ Memtest86] is a stand-alone RAM tester. It can either be booted from a CD, or from your normal Linux bootloader, such as GRUB or LILO.
 
[http://memtest86.com/ Memtest86] is a stand-alone RAM tester. It can either be booted from a CD, or from your normal Linux bootloader, such as GRUB or LILO.
  
 +
<!--T:5-->
 
[http://memtest.org/ Memtest86+] is a forked version of Memtest86 with some features added.
 
[http://memtest.org/ Memtest86+] is a forked version of Memtest86 with some features added.
  
==== Installation ====
+
==== Installation ==== <!--T:6-->
 
You can either download and install one of this programs from the sites above, or they can be a part of your Linux distribution already.
 
You can either download and install one of this programs from the sites above, or they can be a part of your Linux distribution already.
  
 +
<!--T:7-->
 
For Fedora Core, memtest86+ is available:
 
For Fedora Core, memtest86+ is available:
 
<pre>yum install memtest86+</pre>
 
<pre>yum install memtest86+</pre>
  
 +
<!--T:8-->
 
For Gentoo, both programs are available:
 
For Gentoo, both programs are available:
 
<pre>emerge memtest86
 
<pre>emerge memtest86
 
emerge memtest86+</pre>
 
emerge memtest86+</pre>
  
==== Usage ====
+
==== Usage ==== <!--T:9-->
 
To test your system for faulty RAM, install either memtest and reboot into it. Run it for at least a few hours (at least 2-3 iterations). If there will be even a single error reported, you have to change your RAM chips (or, if your system is overclocked, downclock it to normal speed).
 
To test your system for faulty RAM, install either memtest and reboot into it. Run it for at least a few hours (at least 2-3 iterations). If there will be even a single error reported, you have to change your RAM chips (or, if your system is overclocked, downclock it to normal speed).
  
=== Memtester ===
+
=== Memtester === <!--T:10-->
 
[http://pyropus.ca/software/memtester/ Memtester] is a userspace utility for testing the memory subsystem for faults. The good thing is you can test your memory without a need to reboot the server, and you can run other programs with it. The bad thing is not all the memory is tested.
 
[http://pyropus.ca/software/memtester/ Memtester] is a userspace utility for testing the memory subsystem for faults. The good thing is you can test your memory without a need to reboot the server, and you can run other programs with it. The bad thing is not all the memory is tested.
  
==== Installation ====
+
==== Installation ==== <!--T:11-->
 
For Fedora Core:
 
For Fedora Core:
 
<pre>yum install memtester</pre>
 
<pre>yum install memtester</pre>
  
 +
<!--T:12-->
 
For Gentoo:
 
For Gentoo:
 
<pre>emerge memtester</pre>
 
<pre>emerge memtester</pre>
  
 +
<!--T:13-->
 
For other systems: download the sources from the [http://pyropus.ca/software/memtester/ Memtester homepage].
 
For other systems: download the sources from the [http://pyropus.ca/software/memtester/ Memtester homepage].
  
==== Usage ====
+
==== Usage ==== <!--T:14-->
  
 +
<!--T:15-->
 
Invoke memtester as a root, giving an amount of memory it will test as an argument, e.g.:
 
Invoke memtester as a root, giving an amount of memory it will test as an argument, e.g.:
 
<pre># /usr/sbin/memtester 512M</pre>
 
<pre># /usr/sbin/memtester 512M</pre>
  
 +
<!--T:16-->
 
The more memory you will specify the better.
 
The more memory you will specify the better.
  
== CPU cooling tests ==
+
== CPU cooling tests == <!--T:17-->
 
Such tests checks that your CPU can work fine under highest possible load and temperature.
 
Such tests checks that your CPU can work fine under highest possible load and temperature.
  
=== Cpuburn ===
+
=== Cpuburn === <!--T:18-->
 
[http://pages.sbcglobal.net/redelm/ Сpuburn] is an utility to burn your CPU as high as possible. It tests your system stability by checking how the CPU and the whole system is working under high temperatures.
 
[http://pages.sbcglobal.net/redelm/ Сpuburn] is an utility to burn your CPU as high as possible. It tests your system stability by checking how the CPU and the whole system is working under high temperatures.
  
=== Installation ===
+
=== Installation === <!--T:19-->
  
 +
<!--T:20-->
 
For Fedora Core you can get the RPMS from [http://dag.wieers.com/packages/cpuburn/ DAG].
 
For Fedora Core you can get the RPMS from [http://dag.wieers.com/packages/cpuburn/ DAG].
  
 +
<!--T:21-->
 
For Gentoo:
 
For Gentoo:
 
<pre>emerge cpuburn</pre>
 
<pre>emerge cpuburn</pre>
  
 +
<!--T:22-->
 
For other systems: download tarball from the [http://pages.sbcglobal.net/redelm/ home page], untar and run.
 
For other systems: download tarball from the [http://pages.sbcglobal.net/redelm/ home page], untar and run.
  
=== Usage ===
+
=== Usage === <!--T:23-->
 
It is recommended to switch to single-user mode and remount all the partitions read-only, just in case of system hang.
 
It is recommended to switch to single-user mode and remount all the partitions read-only, just in case of system hang.
  
 +
<!--T:24-->
 
Run
 
Run
 
<pre>burnBX || echo $? &</pre>
 
<pre>burnBX || echo $? &</pre>
Line 66: Line 80:
 
<pre>killall -TERM burnBX</pre>
 
<pre>killall -TERM burnBX</pre>
  
 +
<!--T:25-->
 
You can also use burnMMX utility:
 
You can also use burnMMX utility:
 
<pre>burnMMX J || echo $? &</pre>
 
<pre>burnMMX J || echo $? &</pre>
  
 +
<!--T:26-->
 
Cpuburn author says burnMMX is not optimal for AMD processors; use burnBX if you have AMD.
 
Cpuburn author says burnMMX is not optimal for AMD processors; use burnBX if you have AMD.
  
== Combined tests ==
+
== Combined tests == <!--T:27-->
  
 +
<!--T:28-->
 
It is also a good thing to run cpuburn and memtester in parallel. Chances are higher that some more errors are detected that way.
 
It is also a good thing to run cpuburn and memtester in parallel. Chances are higher that some more errors are detected that way.
  
== External links ==
+
== Cerberus Test Control System == <!--T:29-->
 +
Another test for hardware publicly available is '''Cerberus Test Control System'''
 +
http://sourceforge.net/projects/va-ctcs/
 +
 
 +
== External links == <!--T:30-->
 
* [http://memtest86.com/ Memtest86]
 
* [http://memtest86.com/ Memtest86]
 
* [http://memtest.org/ Memtest86+]
 
* [http://memtest.org/ Memtest86+]
 
* [http://pyropus.ca/software/memtester/ Memtester]
 
* [http://pyropus.ca/software/memtester/ Memtester]
 
* [http://pages.sbcglobal.net/redelm/ Сpuburn]
 
* [http://pages.sbcglobal.net/redelm/ Сpuburn]
 +
* [http://sourceforge.net/projects/va-ctcs/ Cerberus]
 +
</translate>
  
 
[[Category: Troubleshooting]]
 
[[Category: Troubleshooting]]
 +
[[Category: QA]]

Latest revision as of 08:40, 26 December 2015

<translate> Sometimes when you have a kernel panic, oops, machine check exception, or other fatal crash, this is not programmers whom to blame, but your hardware (a classical example is OpenVZ Bug #174). This article describes how to properly test your hardware to check it is in a good shape.

Note that most of the tests described below could do harm to your machine if something is wrong with it (e.g. it is overclocked, undercooled etc). In general, overclocking is not recommended for your production server boxes.

RAM tests

Random Access Memory (RAM) is sometimes faulty, which leads to some very strange system crashes. It is though highly recommended to test your system RAM. A several approaches and tools can be used.

Memtest86 and Memtest86+

Memtest86 is a stand-alone RAM tester. It can either be booted from a CD, or from your normal Linux bootloader, such as GRUB or LILO.

Memtest86+ is a forked version of Memtest86 with some features added.

Installation

You can either download and install one of this programs from the sites above, or they can be a part of your Linux distribution already.

For Fedora Core, memtest86+ is available:

yum install memtest86+

For Gentoo, both programs are available:

emerge memtest86
emerge memtest86+

Usage

To test your system for faulty RAM, install either memtest and reboot into it. Run it for at least a few hours (at least 2-3 iterations). If there will be even a single error reported, you have to change your RAM chips (or, if your system is overclocked, downclock it to normal speed).

Memtester

Memtester is a userspace utility for testing the memory subsystem for faults. The good thing is you can test your memory without a need to reboot the server, and you can run other programs with it. The bad thing is not all the memory is tested.

Installation

For Fedora Core:

yum install memtester

For Gentoo:

emerge memtester

For other systems: download the sources from the Memtester homepage.

Usage

Invoke memtester as a root, giving an amount of memory it will test as an argument, e.g.:

# /usr/sbin/memtester 512M

The more memory you will specify the better.

CPU cooling tests

Such tests checks that your CPU can work fine under highest possible load and temperature.

Cpuburn

Сpuburn is an utility to burn your CPU as high as possible. It tests your system stability by checking how the CPU and the whole system is working under high temperatures.

Installation

For Fedora Core you can get the RPMS from DAG.

For Gentoo:

emerge cpuburn

For other systems: download tarball from the home page, untar and run.

Usage

It is recommended to switch to single-user mode and remount all the partitions read-only, just in case of system hang.

Run

burnBX || echo $? &

for at least 15 minutes. If you have more than one physical CPU, repeat the command. If nothing is happening withing 15-20 minutes, and your system is still responding, you can conclude the test is passed, and kill the process(es):

killall -TERM burnBX

You can also use burnMMX utility:

burnMMX J || echo $? &

Cpuburn author says burnMMX is not optimal for AMD processors; use burnBX if you have AMD.

Combined tests

It is also a good thing to run cpuburn and memtester in parallel. Chances are higher that some more errors are detected that way.

Cerberus Test Control System

Another test for hardware publicly available is Cerberus Test Control System http://sourceforge.net/projects/va-ctcs/

External links

</translate>