Sometimes when you have a kernel panic, oops, or other fatal crash, this is not programmers whom to blame. This article describes how to properly test your hardware to check it's in a good shape.
Contents
RAM tests
Random Access Memory (RAM) is sometimes faulty, which leads to some very strange system crashes. It is though highly recommended to test your system RAM. A several approaches and tools can be used.
Memtest86 and Memtest86+
Memtest86 is a stand-alone RAM tester. It can either be booted from a CD, or from your normal Linux bootloader, such as GRUB or LILO.
Memtest86+ is a forked version of Memtest86 with some features added.
You can either download and install one of this programs from the sites above, or they can be a part of your Linux distribution already.
For Fedora Core, memtest86+ is available: yum install memtest86+
For Gentoo, both programs are available: emerge memtest86 emerge memtest86+
To test your system for faulty RAM, install either memtest and reboot into it. Run it for at least a few hours (at least 2-3 iterations). If there will be even a single error reported, you have to change your RAM chips (or, if your system is overclocked, downclock it to normal speed).
Memtester
Memtester is a userspace utility for testing the memory subsystem for faults. It is a part of some distributions.
For Fedora Core:
yum install memtester
For Gentoo:
emerge memtester
The good thing is you can test your memory without a need to reboot the server, and you can run other programs with it. The bad thing is not all the memory is tested.
Invoke memtester as a root, giving an amount of memory it will test as an argument, e.g.:
# /usr/sbin/memtester 512M
The more memory you will specify the better.
CPU cooling tests
FIXME cpuburn