Difference between revisions of "On-demand accounting"

From OpenVZ Virtuozzo Containers Wiki
Jump to: navigation, search
m (a bunch of spelling fixes On-demand accounting basics)
m (kernel internals)
 
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
{{UBC toc}}
 
{{UBC toc}}
 +
[[Category: Kernel internals]]
  
 
This page describes a very promising way of beancounters optimization.
 
This page describes a very promising way of beancounters optimization.
Line 5: Line 6:
 
== Current accounting model ==
 
== Current accounting model ==
 
Basically allocation of any kind of resource looks like this:
 
Basically allocation of any kind of resource looks like this:
<pre>
+
<source lang="c">
 
struct some_resource *get_the_resource(int amount)
 
struct some_resource *get_the_resource(int amount)
 
{
 
{
Line 13: Line 14:
 
         return ret;
 
         return ret;
 
}
 
}
</pre>
+
</source>
 
We change this behaviour to work like this:
 
We change this behaviour to work like this:
<pre>
+
<source lang="c">
 
struct some_resource *get_the_resource(int amount)
 
struct some_resource *get_the_resource(int amount)
 
{
 
{
Line 30: Line 31:
 
         return NULL;
 
         return NULL;
 
}
 
}
</pre>
+
</source>
 
The <code>charge_beancounter()</code> call is responsible for checking whether the user is allowed to get the requested amount of the resource, i.e. if the resource consumption level is lower than the limit set.
 
The <code>charge_beancounter()</code> call is responsible for checking whether the user is allowed to get the requested amount of the resource, i.e. if the resource consumption level is lower than the limit set.
  
Obviously, this change slows down the original code, as charge_beancounter() takes some slow operations like taking locks. We have an idea of how to optimize this behavior.
+
Obviously, this change slows down the original code, as <code>charge_beancounter()</code> takes some slow operations like taking locks. We have an idea of how to optimize this behavior.
  
 
== On-demand accounting basics ==
 
== On-demand accounting basics ==
Line 43: Line 44:
 
Let's look at example of how this will work with the user memory accounting.
 
Let's look at example of how this will work with the user memory accounting.
  
Currently we account for the [[physpages]] resource. That is -- the number of physical pages consumed by the processes. The accounting hooks are placed inside the page faults and hurt the performance. The accounting looks like this:
+
Currently we account for the [[physpages]] resource, that is, the number of physical pages consumed by a set of processes. The accounting hooks are placed inside the page fault handlers and thus hurting the performance. Currently accounting looks like this:
<pre>
+
<source lang="c">
 
struct page *get_new_page(struct mm_struct *mm)
 
struct page *get_new_page(struct mm_struct *mm)
 
{
 
{
Line 59: Line 60:
 
         return NULL;
 
         return NULL;
 
}
 
}
</pre>
+
</source>
  
However, we have a good estimation of the RSS size -- that is the lenghts of mappings of the processes. Since the physical pages can only be allocated within these mappgins the RSS value can never exceed the sum of theis lenghs. The accounting will then look like this:
+
However, we have a good upper estimation of the RSS size that is the lengths of mappings of the processes. Since the physical pages can only be allocated within these mappings, the RSS value can never exceed the sum of their lengths. The accounting will then look like this:
<pre>
+
<source lang="c">
 
struct vm_area_struct *get_new_mapping(struct mm_struct *mm,
 
struct vm_area_struct *get_new_mapping(struct mm_struct *mm,
 
                 unsigned long pages)
 
                 unsigned long pages)
Line 96: Line 97:
 
         return NULL;
 
         return NULL;
 
}
 
}
</pre>
+
</source>
We do not call the slow <code>charge_beancounter()</code> function in the page fault (<code>get_new_page()</code>). Instead we account for the upper estimation in <code>get_new_mapping()</code> call that happens rarely and thus increase the performance.
+
We do not call the slow <code>charge_beancounter()</code> function in the page fault (<code>get_new_page()</code>). Instead we account for the upper estimation in <code>get_new_mapping()</code> call that happens rarely and thus do not affect the performance.
  
 
Note, that the <code>recalculate_the_rss()</code> is called to calculate the exact RSS value on the beancounter.
 
Note, that the <code>recalculate_the_rss()</code> is called to calculate the exact RSS value on the beancounter.

Latest revision as of 12:55, 24 January 2008

User Beancounters
Definition
/proc/user_beancounters
/proc/bc/
General information
Units of measurement
VSwap
Parameters description
Primary parameters
numproc, numtcpsock, numothersock, vmguarpages
Secondary parameters
kmemsize, tcpsndbuf, tcprcvbuf, othersockbuf, dgramrcvbuf, oomguarpages, privvmpages
Auxiliary parameters
lockedpages, shmpages, physpages, numfile, numflock, numpty, numsiginfo, dcachesize, numiptent, swappages
Internals
User pages accounting
RSS fractions accounting
On-demand accounting
UBC consistency
Consistency formulae
System-wide configuration
vzubc(8)
Configuration examples
Basic
Derived
Intermediate configurations
Tables
List of parameters
Parameter properties
Consistency
Config examples

This page describes a very promising way of beancounters optimization.

Current accounting model[edit]

Basically allocation of any kind of resource looks like this:

struct some_resource *get_the_resource(int amount)
{
        struct some_resource *ret;

        ret = find_or_allocate_the_resource(amount);
        return ret;
}

We change this behaviour to work like this:

struct some_resource *get_the_resource(int amount)
{
        struct some_resource *ret;

        if (charge_beancounter(amount) < 0)       
                return NULL;   

        ret = find_or_allocate_the_resource(amount);
        if (ret != NULL)
                return ret;

        uncharge_beancounter(amount);
        return NULL;
}

The charge_beancounter() call is responsible for checking whether the user is allowed to get the requested amount of the resource, i.e. if the resource consumption level is lower than the limit set.

Obviously, this change slows down the original code, as charge_beancounter() takes some slow operations like taking locks. We have an idea of how to optimize this behavior.

On-demand accounting basics[edit]

The main idea sonds like this:

If the consumption level of any resource can be easily upper estimated with some value, and this estimation is lower than the limit, then we do not need to know the exact consumption level and allow the resource allocation without additional checks

Apparently, when the estimation exceeds the limit, we must switch to the slower mode, that will give us more precise value of the consumption level and (probably) allocate another portion of the resource.

Example[edit]

Let's look at example of how this will work with the user memory accounting.

Currently we account for the physpages resource, that is, the number of physical pages consumed by a set of processes. The accounting hooks are placed inside the page fault handlers and thus hurting the performance. Currently accounting looks like this:

struct page *get_new_page(struct mm_struct *mm)
{
        struct page *pg;

        if (charge_beancounter(1) < 0)
                return NULL;

        pg = alloc_new_page(mm);
        if (pg != NULL)
                return pg;

        uncharge_beancounter(1);
        return NULL;
}

However, we have a good upper estimation of the RSS size — that is the lengths of mappings of the processes. Since the physical pages can only be allocated within these mappings, the RSS value can never exceed the sum of their lengths. The accounting will then look like this:

struct vm_area_struct *get_new_mapping(struct mm_struct *mm,
                unsigned long pages)
{
        if (!mm->fast_accounting)
                goto allocate;

        if (charge_beancounter(pages) == 0)
                goto allocate;

        mm->fast_accounting = 0;
        recalculate_the_rss(mm);

allocate:
        expand_mapping(mm);
}

struct page *get_new_page(struct mm_struct *mm)
{
        if (mm->fast_accounting)
                goto fast_path;

        if (charge_beancounter(1) < 0)
                return NULL;

fast_path:
        pg = alloc_new_page(mm);
        if (pg != NULL)
                return pg;

        if (!mm->fast_accounting)
                uncharge_beancounter(1);
        return NULL;
}

We do not call the slow charge_beancounter() function in the page fault (get_new_page()). Instead we account for the upper estimation in get_new_mapping() call that happens rarely and thus do not affect the performance.

Note, that the recalculate_the_rss() is called to calculate the exact RSS value on the beancounter.

More things to do[edit]

In this model we switch from the fast acounting to the slow one. However, if the upper estimation becomes lower than the limit again we can switch back to the fast model. However, these switches are not very fast themselves, and being too frequent can hurt the performance instead of improving. So this would require some further investigations.