Editing User pages accounting

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 4: Line 4:
  
 
== Introduction ==
 
== Introduction ==
User pages is the second important resource (after [[kmemsize]]) which must be accounted.
+
User pages is the second importaint resource (after kmemsize) which must be accounted.
Unlike kernel memory, which is either used or not, the set of user pages may have different classifications. Pages may be backed by file, locked, unused, i.e. requested with <code>mmap</code>/<code>brk</code> but not yet touched and so on. Thus user pages accounting is trickier than the one for kernel pages.
+
Despite kernel memory, which is either used or not, the set of user pages may have different classifications. Pages may be backed by file, locked, unused, i.e. requesed with mmap/brk but not yet touched and so on. Thus user pages accounting is trickier that kernel pages' one.
  
 
== Ways of accounting ==
 
== Ways of accounting ==
There are different approaches to user pages control:
+
There are tree ways of user pages' accouting.
  
; Account all the mappings on mmap/brk and reject as soon as the sum of VMA's lengths reaches the barrier.
+
; Account all the mappings on mmap/brk and reject as soon as the sum of VMA-s lenghts reaches the barrier.
: This approach is very bad as applications ''always'' map more than they really use, very often MUCH more.
+
: This type is very bad as applications ''always'' map more than they really use and very oftem MUCH more.
; Account only the really used memory and reject as soon as {{H:title|Resident Set Size|RSS}} reaches the limit.
+
; Account only the really used memory and reject as soon as RSS <ref>Resident Set Size</ref> reaches the limit.
: This approach is not good either as the only place where pages appear in user space is page fault handler and the only way to reject is killing the task. Comparing to previous scenarion this is much worse as application won't even be able to terminate gracefully.
+
: This type is not good either as the only place where pages appear in user space is page fault handler and the only way to reject is killing the task. Comparing to previous scenarion this is much worse as application won't even be able to close correctly.
 
; Account a part of memory on mmap/brk and reject there, and account the rest of the memory in page fault handlers without any rejects.
 
; Account a part of memory on mmap/brk and reject there, and account the rest of the memory in page fault handlers without any rejects.
: This type of accounting is used in UBC.
+
: This type of accounting it used in UBC.
; Account physical memory and behave like a standalone kernel - reclaim user memory when run out of it.
 
: This type of memory control is to be introduced later as an addition to current scheme. UBC provides all the needed statistics for this (physical memory, swap pages etc.)
 
  
 
== UBC user pages accounting ==
 
== UBC user pages accounting ==
 
=== Terms ===
 
=== Terms ===
The following terms are used by UBC:
+
The following terms are used by UBC
* ''shmem mapping'' — a mapping of file belonging to <code>tmpfs</code>. UBC accounts these pages separately and the description below doesn't take such pages into account;
+
# ''shmem mapping'' - this is the mapping of file, that belong to tmpfs. UBC accounts these pages separately and the description below doesn't take them into account;
* ''private mapping'' this includes the following types of mappings:
+
# ''private mapping'' - this includes the following types of mappings:
** writable anonymous mappings
+
## writable anonymous mappings
** writable private file mappings
+
## writable private file mappings
: Both types are not backed by disk file and thus may not be just freed. These mappings are charged with possible reject right when they are made in <code>sys_mmap()</code>/<code>sys_brk()</code>.
+
#: Both types are not backed by disk file and thus may not be just freed. These mappings are charged with possible reject right when they are made - in sys_mmap()/sys_brk().
* ''unused pages'' this is the number of pages which belong to private mapping, but are not yet touched.
+
# ''unused pages'' - this is the number of pages which belong to private mapping, but are not yet touched.
  
 
=== Math model ===
 
=== Math model ===
 
The following notations are used:
 
The following notations are used:
 +
<ol>
 +
<li> <math>UB_{privvm}</math> is the total number of privvmpages accounted on UB. I.e. the value seen in <tt>/proc/user_beancounters</tt> in <tt>privvmpages.held</tt>;</li>
 +
<li> <math>UB_{unused}</math> is the total number of unused pages. This parameter is shown in <tt>/proc/user_beancounters_debug</tt> file;</li>
 +
<li> <math>Frac(page, UB)</math> is some value that represents the part of the page charged to beancounter in case, when page is mapped;</li>
 +
<li> <math>RSS</math> is the amount of physical memory used by processes. </li>
 +
</ol>
  
* <math>UB_{privvm}</math> is the total number of privvmpages accounted on UB. In other words, this is the value seen in <code>/proc/user_beancounters</code> in <code>privvmpages.held</code>;
+
Page fraction (<math>Frac(page, UB)</math>) normally should be <math>\frac{1}{N}</math>, where <math>N</math> is the number of UBs the pages is shared between, but this is bad as adding a new UB to page shared set would require recalculating of the whole current set. In UB <math>Frac(page, UB) = \frac{1}{2^{UB_{shift}}}</math>, where <math>UB_{shift}</math> is some parameter which is calculated so that
* <math>UB_{unused}</math> is the total number of unused pages. This parameter is shown in <code>/proc/user_beancounters_debug</code> file;
 
* <math>Frac(page, UB)</math> is some value that represents the part of the page charged to beancounter in case when page is mapped;
 
* <math>RSS</math> is the amount of physical memory used by processes.
 
 
 
Page fraction <math>Frac(page, UB)</math> normally should be <math>\frac{1}{N}</math>, where <math>N</math> is the number of UBs the pages are shared between, but this is bad since adding a new UB to page shared set would require recalculation of the whole current set. In UB <math>Frac(page, UB) = \frac{1}{2^{UB_{shift}}}</math>, where <math>UB_{shift}</math> is some parameter which [[RSS fractions accounting|is calculated]] so that
 
 
<center>
 
<center>
 
<math>\sum _{UB : page \in UB} Frac(page, UB) = 1, \forall page</math>.
 
<math>\sum _{UB : page \in UB} Frac(page, UB) = 1, \forall page</math>.
 
</center>
 
</center>
The notation <math>page \in UB</math> means <em>there exists an mm_struct, where the page <math>page</math> is mapped to and this mm_struct belongs to <math>UB</math> beancounter</em>. This type of calculation allows making an O(1) algorithm of fractions accounting.
+
The notation <math>page \in UB</math> means "<em>there exists an mm_struct, where the page <math>page</math> is mapped to and this mm_struct belongs to <math>UB</math> beancounter</em>". This type of calculation allowed to make an O(1) algorithm of fractions accounting.
  
[[Privvmpages]] accounts the sum of unused pages and the “normalized” number of RSS pages:
+
<tt>Privvmpages</tt> accounts the sum of unused pages and the "normalized" number of rss pages:
  
 
<center>
 
<center>
Line 51: Line 50:
 
</center>
 
</center>
  
If you sum the [[privvmpages]] of all beancounters in the system you'll get an upper estimation of current physical memory usage by processes:
+
If you summ the <tt>privvmpages</tt> of all beancounters in the system you'll get an upper estimation of current physical memory usage by processes.
  
 
<center>
 
<center>
 
<math>
 
<math>
\sum _{UB} UB_{privvm} = \,
+
\sum _{UB} UB_{privvm} =
 
</math>
 
</math>
  
Line 71: Line 70:
  
 
<math>
 
<math>
RSS + \sum _{UB} UB_{unused} \,
+
RSS + \sum _{UB} UB_{unused}
 
</math>
 
</math>
 
</center>
 
</center>
  
To get ''real'' physical memory usage [[physpages]] resource is used <ref>Actually physpages also include tmpfs resident pages</ref>.
+
To get ''real'' physical memory usage <tt>physpages</tt> resource is used <ref>Actually <tt>physpages</tt> also include tmpfs resident pages</ref>.
  
 
<center>
 
<center>
Line 85: Line 84:
 
<center>
 
<center>
 
<math>
 
<math>
\sum _{UB} UB_{physpages} = RSS \,
+
\sum _{UB} UB_{physpages} = RSS
 
</math>
 
</math>
 
</center>
 
</center>
Line 91: Line 90:
 
----
 
----
 
<references/>
 
<references/>
 
[[Category: Kernel internals]]
 

Please note that all contributions to OpenVZ Virtuozzo Containers Wiki may be edited, altered, or removed by other contributors. If you don't want your writing to be edited mercilessly, then don't submit it here.
If you are going to add external links to an article, read the External links policy first!

To edit this page, please answer the question that appears below (more info):

Cancel Editing help (opens in new window)