RSS fractions accounting

From OpenVZ Virtuozzo Containers Wiki
Jump to: navigation, search
User Beancounters
Definition
/proc/user_beancounters
/proc/bc/
General information
Units of measurement
VSwap
Parameters description
Primary parameters
numproc, numtcpsock, numothersock, vmguarpages
Secondary parameters
kmemsize, tcpsndbuf, tcprcvbuf, othersockbuf, dgramrcvbuf, oomguarpages, privvmpages
Auxiliary parameters
lockedpages, shmpages, physpages, numfile, numflock, numpty, numsiginfo, dcachesize, numiptent, swappages
Internals
User pages accounting
RSS fractions accounting
On-demand accounting
UBC consistency
Consistency formulae
System-wide configuration
vzubc(8)
Configuration examples
Basic
Derived
Intermediate configurations
Tables
List of parameters
Parameter properties
Consistency
Config examples

This article describes how fractions accounting algorithm used in privvmpages accounting works.

Introduction[edit]

The goal of the algorithm is to calculate the RSS for each beancounter. This is done by establishing so called many-to-many relation between physical pages and beancounters.

Yellowpin.svg Note: the term “page is mapped to beancounter” used in this article means “there exists an mm_struct the pages is mapped to and this mm_struct belongs to this beancounter”.

Math model[edit]

As described in the article about user pages, this algorithm calculates the “fractions” of pages that are mapped to beancounter, in a tricky manner.

Naturally, if a page is shared between beancounters, each UB must get -th part of it. This approach would be the most precise, but also the worst, as adding a page to a new beancounter would require an update on all the beancounters between which the page is already shared.

Instead of doing that, UBC charges parts of page equal to , where is calculated so that

Such fractions allow to build an O(1) algorithm, as described below.

Algorithm description[edit]

The complete description of the algorithm consists of describing an objects used in it and two procedures — adding a page to beancounter and removing it.

Objects[edit]

The following objects are used in this algorithm:

Page beancounter (PB)
A tie between page and beancounter. This is the PB which holds the value of ;
Page
A physical page. In Linux kernel each physical page is represented with struct page. What is important is that each page has its associated circular list of PBs and the head PB in this list has a special meaning;
Beancounter
A known object. To work with this algorithm beancounter has an auxiliary resource called “held pages”.

Page beancounters are stored in a large hash to hasten the lookup of a needed PB.

The next two sections describe the procedures of adding and removing of a reference.

Adding a new reference[edit]

When page is added to UB, one of the following scenarios is possible:

There's already a tie between the page an the UB
In this case just increment the reference count of the tie;
A new reference must be created
In this case the new PB is created. -s are recalculated for new PB and the page list head only so that one half of the page's fractions held by the head is moved to the new PB. After this new PB is added to the tail of the list and the second PB in list becomes the head.

In a C-like notation this would look like this:

pb = lookup_pb(page, UB);
if (pb != NULL) {
        pb.refcount++;
        return;
}       

pb = alloc_pb();
pb.refcount = 1;

head = page.pb_list.first;

head.shift++;
pb.shift = head.shift;
pb.tie = tie(page, UB);

list_add_tail(pb, page.pb_list);
page.pb_list.first = head.next;

Shift increment essentially means “take one half of the currently held fraction and give it to the newbie”.

Removing an existing reference[edit]

When the existing reference is removed, the following scenarios are possible:

This page was referenced to BC more than once (e.g. by two tasks)
In this case reference counter is decremented;
The last reference is removed
In this case the PB that ties page and UB must give its shift back to other PBs and be removed from list. This “giving shift back” procedure may involve three PBs and it is better described by the code example below.
pb = lookup_pb(page, UB);
pb.refcount--;
if (pb.refcount > 0)
        return;

list_del(pb);

tail = page.pb_list.last;
tail.shift--;
page.pb_list.first = tail;

if (pb.shift < tail.shift + 1) {
        tail = page.pb_list.last;
        tail.shift--;
        page.pb_list.first = tail
}

Decrementing the shifts like this means “take the fraction held by the PB going out and give it to another one; if the fraction held is too large — share it between two PBs”.

Yellowpin.svg Note: cases when page.pb_list is becoming empty on adding a reference, or becoming empty on removing references are skipped to simplify the code examples.