Difference between revisions of "KSM (kernel same-page merging)"
| Corradofiore (talk | contribs)  (→Tuning) | Corradofiore (talk | contribs)  m (Corrected typo (KSM should be written in lowercase)) | ||
| (15 intermediate revisions by 4 users not shown) | |||
| Line 1: | Line 1: | ||
| − | KSM is a memory-saving de-duplication feature,  | + | <translate> | 
| + | <!--T:1--> | ||
| + | KSM is a memory-saving de-duplication feature, developed by Red Hat. It first appeared in the Linux kernel version 2.6.32. | ||
| + | <!--T:2--> | ||
| KSM replaces RAM pages of identical content with a single write-protected page, which in turn gets automatically copied to a new one if a process later wants to update its content.  This makes the de-duplication mechanism transparent to applications.  This strategy is commonly known as COW (''Copy On Write''). | KSM replaces RAM pages of identical content with a single write-protected page, which in turn gets automatically copied to a new one if a process later wants to update its content.  This makes the de-duplication mechanism transparent to applications.  This strategy is commonly known as COW (''Copy On Write''). | ||
| + | <!--T:3--> | ||
| Although KSM was originally developed for use with KVM, it can be used with OpenVZ containers as well. | Although KSM was originally developed for use with KVM, it can be used with OpenVZ containers as well. | ||
| − | == Performance and overhead == | + | == Performance and overhead == <!--T:4--> | 
| KSM is more effective when the hardware node has plenty of RAM and is running many containers with the same applications (e.g. a HN running 50 containers with Apache, PHP and MySQL). | KSM is more effective when the hardware node has plenty of RAM and is running many containers with the same applications (e.g. a HN running 50 containers with Apache, PHP and MySQL). | ||
| + | <!--T:5--> | ||
| The amount of RAM that can be saved is in the 30-50% range depending on the application, with applications that have a mixed I/O and CPU footprint (e.g. MySQL and Apache) getting the best results.<ref>[http://www.researchgate.net/publication/220946080_An_Empirical_Study_on_Memory_Sharing_of_Virtual_Machines_for_Server_Consolidation ''An Empirical Study on Memory Sharing of Virtual Machines for Server Consolidation'']</ref> | The amount of RAM that can be saved is in the 30-50% range depending on the application, with applications that have a mixed I/O and CPU footprint (e.g. MySQL and Apache) getting the best results.<ref>[http://www.researchgate.net/publication/220946080_An_Empirical_Study_on_Memory_Sharing_of_Virtual_Machines_for_Server_Consolidation ''An Empirical Study on Memory Sharing of Virtual Machines for Server Consolidation'']</ref> | ||
| + | <!--T:6--> | ||
| It is worth noting that KSM and Virtuozzo use totally different strategies for RAM deduplication:  KSM provides a constantly running daemon (called ''ksmd'') which scans the HN's memory and merges identical RAM pages gradually over time, whereas Virtuozzo merges requests from different containers to the same physical binaries on disk.  In doing so, Virtuozzo incurs no overhead at all, while KSM has a 5-10% CPU overhead depending on its configuration (faster scanning of the HN RAM will require more CPU power). | It is worth noting that KSM and Virtuozzo use totally different strategies for RAM deduplication:  KSM provides a constantly running daemon (called ''ksmd'') which scans the HN's memory and merges identical RAM pages gradually over time, whereas Virtuozzo merges requests from different containers to the same physical binaries on disk.  In doing so, Virtuozzo incurs no overhead at all, while KSM has a 5-10% CPU overhead depending on its configuration (faster scanning of the HN RAM will require more CPU power). | ||
| − | == Enabling KSM on the hardware node == | + | == Enabling KSM on the hardware node == <!--T:7--> | 
| First, you'll want to verify that KSM support is present and enabled in your OpenVZ kernel: | First, you'll want to verify that KSM support is present and enabled in your OpenVZ kernel: | ||
|   [root@HN ~]# grep KSM /boot/config-`uname -r` |   [root@HN ~]# grep KSM /boot/config-`uname -r` | ||
|   CONFIG_KSM=y |   CONFIG_KSM=y | ||
| − | The KSM daemon is controlled by sysfs files in /sys/kernel/mm/ksm/, readable by all but writable only by root: | + | <!--T:8--> | 
| + | The KSM daemon is controlled by sysfs files in <code>/sys/kernel/mm/ksm/</code>, readable by all but writable only by root.  On your hardware node, if KSM is not yet active, you'll see these default values: | ||
|   <nowiki>[root@HN ~]# grep -H '' /sys/kernel/mm/ksm/* |   <nowiki>[root@HN ~]# grep -H '' /sys/kernel/mm/ksm/* | ||
| /sys/kernel/mm/ksm/full_scans:0 | /sys/kernel/mm/ksm/full_scans:0 | ||
| Line 29: | Line 36: | ||
| /sys/kernel/mm/ksm/sleep_millisecs:20</nowiki> | /sys/kernel/mm/ksm/sleep_millisecs:20</nowiki> | ||
| + | <!--T:9--> | ||
| For the meaning of each parameter, please refer to https://www.kernel.org/doc/Documentation/vm/ksm.txt | For the meaning of each parameter, please refer to https://www.kernel.org/doc/Documentation/vm/ksm.txt | ||
| + | <!--T:10--> | ||
| To start ''ksmd'', issue | To start ''ksmd'', issue | ||
|   [root@HN ~]# echo 1 > /sys/kernel/mm/ksm/run |   [root@HN ~]# echo 1 > /sys/kernel/mm/ksm/run | ||
| − | You can copy the same command in /etc/rc.local on the HN to make it persistent at boot. | + | <!--T:11--> | 
| + | You can copy the same command in <code>/etc/rc.local</code> on the HN to make it persistent at boot. | ||
| + | <!--T:12--> | ||
| Verify that ''ksmd'' is running: | Verify that ''ksmd'' is running: | ||
|   [root@HN ~]# ps aux | grep ksmd |   [root@HN ~]# ps aux | grep ksmd | ||
| Line 41: | Line 52: | ||
|   root      989187  0.0  0.0 103252   896 pts/0    S+   09:30   0:00 grep ksmd |   root      989187  0.0  0.0 103252   896 pts/0    S+   09:30   0:00 grep ksmd | ||
| − | == Enabling memory deduplication libraries in containers == | + | == Enabling memory deduplication libraries in containers == <!--T:13--> | 
| In order to have KSM consider a memory page as a candidate for deduplication, the application itself must mark it as mergeable: | In order to have KSM consider a memory page as a candidate for deduplication, the application itself must mark it as mergeable: | ||
| + | <!--T:14--> | ||
| :''KSM only operates on those areas of address space which an application has advised to be likely candidates for merging, by using the madvise(2) system call: int madvise(addr, length, MADV_MERGEABLE).'' | :''KSM only operates on those areas of address space which an application has advised to be likely candidates for merging, by using the madvise(2) system call: int madvise(addr, length, MADV_MERGEABLE).'' | ||
| + | <!--T:15--> | ||
| Note that most applications do not use madvise() at all:  that's why KSM is generally used in conjunction with KVM, which takes care of marking pages as mergeable on behalf of the applications running within each virtual machine. | Note that most applications do not use madvise() at all:  that's why KSM is generally used in conjunction with KVM, which takes care of marking pages as mergeable on behalf of the applications running within each virtual machine. | ||
| + | <!--T:16--> | ||
| Luckily, we can override the default behaviour of each application using the '''ksm_preload''' package (http://vleu.net/ksm_preload/) available in CentOS base repo. | Luckily, we can override the default behaviour of each application using the '''ksm_preload''' package (http://vleu.net/ksm_preload/) available in CentOS base repo. | ||
| + | <!--T:17--> | ||
| :''Linux ≥ 2.6.32 features a memory-saving mechanism that works by deduplicating areas of memory that are identical in different processes (even if they were generated at runtime and after the fork() of their common ancestors).'' | :''Linux ≥ 2.6.32 features a memory-saving mechanism that works by deduplicating areas of memory that are identical in different processes (even if they were generated at runtime and after the fork() of their common ancestors).'' | ||
| + | <!--T:18--> | ||
| :''This mechanism requires the application to opt-in using the madvise() syscall. KSM Preload enables legacy applications (about any current application) to leverage this system by calling madvise(…, MADV_MERGEABLE) on every heap-allocated pages.'' | :''This mechanism requires the application to opt-in using the madvise() syscall. KSM Preload enables legacy applications (about any current application) to leverage this system by calling madvise(…, MADV_MERGEABLE) on every heap-allocated pages.'' | ||
| − | Do not use the usual  | + | <!--T:19--> | 
| + | Do not use the usual <code># yum install ksm_preload</code> command inside your containers, as it will install an unnecessary stream of dependencies.  Assuming your container is running on a recent CentOS 6.x template, issue the following commands instead: | ||
| − |   [root@container /]# cd /usr/local/src | + |   <!--T:20--> | 
| + | [root@container /]# cd /usr/local/src | ||
|   [root@container /]# yum install -y yum-downloadonly |   [root@container /]# yum install -y yum-downloadonly | ||
|       ''...bunch of output...'' |       ''...bunch of output...'' | ||
| − | + | ||
| + | (If you get an error like "No package yum-downloadonly available", just ignore it and proceed.  Most likely, your Yum installation includes that plugin already). | ||
| + | |||
|   [root@container src]# yum -y --downloadonly --downloaddir=/usr/local/src install ksm_preload |   [root@container src]# yum -y --downloadonly --downloaddir=/usr/local/src install ksm_preload | ||
|       ''...bunch of output...'' |       ''...bunch of output...'' | ||
|   exiting because --downloadonly specified  ''// this is OK'' |   exiting because --downloadonly specified  ''// this is OK'' | ||
| + | <!--T:21--> | ||
| We can now install just the ksm_preload RPM: | We can now install just the ksm_preload RPM: | ||
| − |   [root@container src]# rpm -i ksm_preload-0.10-3.el6.x86_64.rpm --nodeps | + |   <!--T:22--> | 
| + | [root@container src]# rpm -i ksm_preload-0.10-3.el6.x86_64.rpm --nodeps | ||
| − | == Enabling memory deduplication in applications == | + | == Enabling memory deduplication in applications == <!--T:23--> | 
| In order to make an application take advantage of ksm_preload and use KSM on the HN, add this line into its startup script (assuming your container is running CentOS 6.x x86_64): | In order to make an application take advantage of ksm_preload and use KSM on the HN, add this line into its startup script (assuming your container is running CentOS 6.x x86_64): | ||
|   LD_PRELOAD=/usr/lib64/libksm_preload.so |   LD_PRELOAD=/usr/lib64/libksm_preload.so | ||
| + | <!--T:24--> | ||
| E.g., if you want to make Percona Server use KSM, modify its startup script like the following: | E.g., if you want to make Percona Server use KSM, modify its startup script like the following: | ||
|   [root@container /]# nano /etc/init.d/mysql |   [root@container /]# nano /etc/init.d/mysql | ||
| Line 80: | Line 103: | ||
|   '''LD_PRELOAD=/usr/lib64/libksm_preload.so''' |   '''LD_PRELOAD=/usr/lib64/libksm_preload.so''' | ||
| + |  '''export LD_PRELOAD''' | ||
|   ''mode=$1    # start or stop'' |   ''mode=$1    # start or stop'' | ||
|   ''...'' |   ''...'' | ||
| + | <!--T:25--> | ||
| Then (re)start your Percona Server as usual. | Then (re)start your Percona Server as usual. | ||
| − | == How to check efficiency of KSM == | + | == How to check efficiency of KSM == <!--T:26--> | 
| To check if KSM is actually reducing memory usage, issue this command on the HN: | To check if KSM is actually reducing memory usage, issue this command on the HN: | ||
| − |   [root@HN /]# cat /sys/kernel/mm/ | + |   [root@HN /]# cat /sys/kernel/mm/ksm/pages_sharing | 
| If the value is greater than 0, you're saving memory.  Refer to https://www.kernel.org/doc/Documentation/vm/ksm.txt for more details. | If the value is greater than 0, you're saving memory.  Refer to https://www.kernel.org/doc/Documentation/vm/ksm.txt for more details. | ||
| − | On [https://gist.github.com/wankdanker/1206923 this page] you'll find a simple script which displays the  | + | <!--T:27--> | 
| + | To see all the KSM parameters, issue the following command on the HN: | ||
| + |  <nowiki>grep -H '' /sys/kernel/mm/ksm/*</nowiki> | ||
| + | |||
| + | <!--T:28--> | ||
| + | On [https://gist.github.com/wankdanker/1206923 this page] you'll find a simple script which displays the same information in MB. | ||
| + | |||
| + | == Tuning == <!--T:29--> | ||
| + | On a production machine, you'll want to modify some of the default values.  A more sane value for <code>/sys/kernel/mm/KSM/sleep_millisecs</code> is usually between 50 and 250 (YMMV though): | ||
| + | |||
| + |  <!--T:30--> | ||
| + | [root@HN ~]# echo 50 > /sys/kernel/mm/ksm/sleep_millisecs | ||
| + | |||
| + | == Caveats == <!--T:31--> | ||
| + | The ksmd daemon will take one or two minutes to start deduplicating memory and will require several minutes to reach stable state.  During the boot phase your HN could start swapping if you have heavily overcommitted your RAM.  You might want to use more aggressive settings (higher <code>pages_to_scan</code>, lower <code>sleep_millisecs</code>) at the beginning, effectively trading CPU utilization for less chances of disk swapping, and then relax them after 10 mins or so.  Another possibility is to place your swap onto an SSD drive. | ||
| + | |||
| + | == References == <!--T:32--> | ||
| + | |||
| + | <!--T:33--> | ||
| + | <references/> | ||
| − | ==  | + | == External links == <!--T:34--> | 
| − | |||
| − | + | <!--T:35--> | |
| + | * [[wikipedia:Kernel same-page merging]] | ||
| + | </translate> | ||
| − | + | [[Category: HOWTO]] | |
| − | |||
Latest revision as of 04:35, 10 October 2016
<translate> KSM is a memory-saving de-duplication feature, developed by Red Hat. It first appeared in the Linux kernel version 2.6.32.
KSM replaces RAM pages of identical content with a single write-protected page, which in turn gets automatically copied to a new one if a process later wants to update its content. This makes the de-duplication mechanism transparent to applications. This strategy is commonly known as COW (Copy On Write).
Although KSM was originally developed for use with KVM, it can be used with OpenVZ containers as well.
Contents
Performance and overhead[edit]
KSM is more effective when the hardware node has plenty of RAM and is running many containers with the same applications (e.g. a HN running 50 containers with Apache, PHP and MySQL).
The amount of RAM that can be saved is in the 30-50% range depending on the application, with applications that have a mixed I/O and CPU footprint (e.g. MySQL and Apache) getting the best results.[1]
It is worth noting that KSM and Virtuozzo use totally different strategies for RAM deduplication: KSM provides a constantly running daemon (called ksmd) which scans the HN's memory and merges identical RAM pages gradually over time, whereas Virtuozzo merges requests from different containers to the same physical binaries on disk. In doing so, Virtuozzo incurs no overhead at all, while KSM has a 5-10% CPU overhead depending on its configuration (faster scanning of the HN RAM will require more CPU power).
Enabling KSM on the hardware node[edit]
First, you'll want to verify that KSM support is present and enabled in your OpenVZ kernel:
[root@HN ~]# grep KSM /boot/config-`uname -r` CONFIG_KSM=y
The KSM daemon is controlled by sysfs files in /sys/kernel/mm/ksm/, readable by all but writable only by root.  On your hardware node, if KSM is not yet active, you'll see these default values:
[root@HN ~]# grep -H '' /sys/kernel/mm/ksm/* /sys/kernel/mm/ksm/full_scans:0 /sys/kernel/mm/ksm/merge_across_nodes:1 /sys/kernel/mm/ksm/pages_shared:0 /sys/kernel/mm/ksm/pages_sharing:0 /sys/kernel/mm/ksm/pages_to_scan:100 /sys/kernel/mm/ksm/pages_unshared:0 /sys/kernel/mm/ksm/pages_volatile:0 /sys/kernel/mm/ksm/run:0 /sys/kernel/mm/ksm/sleep_millisecs:20
For the meaning of each parameter, please refer to https://www.kernel.org/doc/Documentation/vm/ksm.txt
To start ksmd, issue
[root@HN ~]# echo 1 > /sys/kernel/mm/ksm/run
You can copy the same command in /etc/rc.local on the HN to make it persistent at boot.
Verify that ksmd is running:
[root@HN ~]# ps aux | grep ksmd root 264 0.3 0.0 0 0 ? SN Jun14 176:52 [ksmd] root 989187 0.0 0.0 103252 896 pts/0 S+ 09:30 0:00 grep ksmd
Enabling memory deduplication libraries in containers[edit]
In order to have KSM consider a memory page as a candidate for deduplication, the application itself must mark it as mergeable:
- KSM only operates on those areas of address space which an application has advised to be likely candidates for merging, by using the madvise(2) system call: int madvise(addr, length, MADV_MERGEABLE).
Note that most applications do not use madvise() at all: that's why KSM is generally used in conjunction with KVM, which takes care of marking pages as mergeable on behalf of the applications running within each virtual machine.
Luckily, we can override the default behaviour of each application using the ksm_preload package (http://vleu.net/ksm_preload/) available in CentOS base repo.
- Linux ≥ 2.6.32 features a memory-saving mechanism that works by deduplicating areas of memory that are identical in different processes (even if they were generated at runtime and after the fork() of their common ancestors).
- This mechanism requires the application to opt-in using the madvise() syscall. KSM Preload enables legacy applications (about any current application) to leverage this system by calling madvise(…, MADV_MERGEABLE) on every heap-allocated pages.
Do not use the usual # yum install ksm_preload command inside your containers, as it will install an unnecessary stream of dependencies.  Assuming your container is running on a recent CentOS 6.x template, issue the following commands instead:
[root@container /]# cd /usr/local/src
[root@container /]# yum install -y yum-downloadonly
    ...bunch of output...
(If you get an error like "No package yum-downloadonly available", just ignore it and proceed. Most likely, your Yum installation includes that plugin already).
[root@container src]# yum -y --downloadonly --downloaddir=/usr/local/src install ksm_preload
    ...bunch of output...
exiting because --downloadonly specified  // this is OK
We can now install just the ksm_preload RPM:
[root@container src]# rpm -i ksm_preload-0.10-3.el6.x86_64.rpm --nodeps
Enabling memory deduplication in applications[edit]
In order to make an application take advantage of ksm_preload and use KSM on the HN, add this line into its startup script (assuming your container is running CentOS 6.x x86_64):
LD_PRELOAD=/usr/lib64/libksm_preload.so
E.g., if you want to make Percona Server use KSM, modify its startup script like the following:
[root@container /]# nano /etc/init.d/mysql ... PATH="/sbin:/usr/sbin:/bin:/usr/bin:$basedir/bin" export PATH LD_PRELOAD=/usr/lib64/libksm_preload.so export LD_PRELOAD mode=$1 # start or stop ...
Then (re)start your Percona Server as usual.
How to check efficiency of KSM[edit]
To check if KSM is actually reducing memory usage, issue this command on the HN:
[root@HN /]# cat /sys/kernel/mm/ksm/pages_sharing
If the value is greater than 0, you're saving memory. Refer to https://www.kernel.org/doc/Documentation/vm/ksm.txt for more details.
To see all the KSM parameters, issue the following command on the HN:
grep -H '' /sys/kernel/mm/ksm/*
On this page you'll find a simple script which displays the same information in MB.
Tuning[edit]
On a production machine, you'll want to modify some of the default values.  A more sane value for /sys/kernel/mm/KSM/sleep_millisecs is usually between 50 and 250 (YMMV though):
[root@HN ~]# echo 50 > /sys/kernel/mm/ksm/sleep_millisecs
Caveats[edit]
The ksmd daemon will take one or two minutes to start deduplicating memory and will require several minutes to reach stable state.  During the boot phase your HN could start swapping if you have heavily overcommitted your RAM.  You might want to use more aggressive settings (higher pages_to_scan, lower sleep_millisecs) at the beginning, effectively trading CPU utilization for less chances of disk swapping, and then relax them after 10 mins or so.  Another possibility is to place your swap onto an SSD drive.
References[edit]
External links[edit]
</translate>
