Open main menu

OpenVZ Virtuozzo Containers Wiki β

KSM (kernel same-page merging)

Revision as of 11:05, 18 July 2015 by Corradofiore (talk | contribs) (Tuning)

KSM is a memory-saving de-duplication feature, added by RedHat to the Linux kernel version 2.6.32.

KSM replaces RAM pages of identical content with a single write-protected page, which in turn gets automatically copied to a new one if a process later wants to update its content. This makes the de-duplication mechanism transparent to applications. This strategy is commonly known as COW (Copy On Write).

Although KSM was originally developed for use with KVM, it can be used with OpenVZ containers as well.

Contents

Performance and overhead

KSM is more effective when the hardware node has plenty of RAM and is running many containers with the same applications (e.g. a HN running 50 containers with Apache, PHP and MySQL).

The amount of RAM that can be saved is in the 30-50% range depending on the application, with applications that have a mixed I/O and CPU footprint (e.g. MySQL and Apache) getting the best results.[1]

It is worth noting that KSM and Virtuozzo use totally different strategies for RAM deduplication: KSM provides a constantly running daemon (called ksmd) which scans the HN's memory and merges identical RAM pages gradually over time, whereas Virtuozzo merges requests from different containers to the same physical binaries on disk. In doing so, Virtuozzo incurs no overhead at all, while KSM has a 5-10% CPU overhead depending on its configuration (faster scanning of the HN RAM will require more CPU power).

Enabling KSM on the hardware node

First, you'll want to verify that KSM support is present and enabled in your OpenVZ kernel:

[root@HN ~]# grep KSM /boot/config-`uname -r`
CONFIG_KSM=y

The KSM daemon is controlled by sysfs files in /sys/kernel/mm/ksm/, readable by all but writable only by root:

[root@HN ~]# grep -H '' /sys/kernel/mm/ksm/*
/sys/kernel/mm/ksm/full_scans:0
/sys/kernel/mm/ksm/merge_across_nodes:1
/sys/kernel/mm/ksm/pages_shared:0
/sys/kernel/mm/ksm/pages_sharing:0
/sys/kernel/mm/ksm/pages_to_scan:100
/sys/kernel/mm/ksm/pages_unshared:0
/sys/kernel/mm/ksm/pages_volatile:0
/sys/kernel/mm/ksm/run:0
/sys/kernel/mm/ksm/sleep_millisecs:20

For the meaning of each parameter, please refer to https://www.kernel.org/doc/Documentation/vm/ksm.txt

To start ksmd, issue

[root@HN ~]# echo 1 > /sys/kernel/mm/ksm/run

You can copy the same command in /etc/rc.local on the HN to make it persistent at boot.

Verify that ksmd is running:

[root@HN ~]# ps aux | grep ksmd
root         264  0.3  0.0      0     0 ?        SN   Jun14 176:52 [ksmd]
root      989187  0.0  0.0 103252   896 pts/0    S+   09:30   0:00 grep ksmd

Enabling memory deduplication libraries in containers

In order to have KSM consider a memory page as a candidate for deduplication, the application itself must mark it as mergeable:

KSM only operates on those areas of address space which an application has advised to be likely candidates for merging, by using the madvise(2) system call: int madvise(addr, length, MADV_MERGEABLE).

Note that most applications do not use madvise() at all: that's why KSM is generally used in conjunction with KVM, which takes care of marking pages as mergeable on behalf of the applications running within each virtual machine.

Luckily, we can override the default behaviour of each application using the ksm_preload package (http://vleu.net/ksm_preload/) available in CentOS base repo.

Linux ≥ 2.6.32 features a memory-saving mechanism that works by deduplicating areas of memory that are identical in different processes (even if they were generated at runtime and after the fork() of their common ancestors).
This mechanism requires the application to opt-in using the madvise() syscall. KSM Preload enables legacy applications (about any current application) to leverage this system by calling madvise(…, MADV_MERGEABLE) on every heap-allocated pages.

Do not use the usual "yum install ksm_preload" inside your containers, as it will install an unnecessary stream of dependencies. Assuming your container is running on a recent CentOS 6.x template, issue the following commands instead:

[root@container /]# cd /usr/local/src
[root@container /]# yum install -y yum-downloadonly
    ...bunch of output...

[root@container src]# yum -y --downloadonly --downloaddir=/usr/local/src install ksm_preload
    ...bunch of output...
exiting because --downloadonly specified  // this is OK

We can now install just the ksm_preload RPM:

[root@container src]# rpm -i ksm_preload-0.10-3.el6.x86_64.rpm --nodeps

Enabling memory deduplication in applications

In order to make an application take advantage of ksm_preload and use KSM on the HN, add this line into its startup script (assuming your container is running CentOS 6.x x86_64):

LD_PRELOAD=/usr/lib64/libksm_preload.so

E.g., if you want to make Percona Server use KSM, modify its startup script like the following:

[root@container /]# nano /etc/init.d/mysql

...
PATH="/sbin:/usr/sbin:/bin:/usr/bin:$basedir/bin"
export PATH

LD_PRELOAD=/usr/lib64/libksm_preload.so

mode=$1    # start or stop
...

Then (re)start your Percona Server as usual.

How to check efficiency of KSM

To check if KSM is actually reducing memory usage, issue this command on the HN:

[root@HN /]# cat /sys/kernel/mm/KSM/pages_sharing

If the value is greater than 0, you're saving memory. Refer to https://www.kernel.org/doc/Documentation/vm/ksm.txt for more details.

On this page you'll find a simple script which displays the saved RAM amount in MB.

Tuning

On a production machine, you'll want to modify some of the default values. A more sane value for /sys/kernel/mm/KSM/sleep_millisecs is usually between 50 and 250 (YMMV though):

[root@HN ~]# echo 50 > /sys/kernel/mm/ksm/sleep_millisecs:50

Caveats

The ksmd daemon will take one or two minutes to start deduplicating memory and will require several minutes to reach stable state. During the boot phase your HN could start swapping if you have heavily overcommitted your RAM. You might want to use more aggressive settings (higher pages_to_scan, lower sleep_millisecs) at the beginning and then relax them after 10 mins or so. Others suggest to place your swap onto an SSD drive.