Difference between revisions of "IO accounting"

From OpenVZ Virtuozzo Containers Wiki
Jump to: navigation, search
m (tiny missprint)
(See also: added I/O limits)
 
(10 intermediate revisions by 3 users not shown)
Line 1: Line 1:
This page describes how I/O activity of VE processes is accounted. This feature is available since OpenVZ kernels version <code>028test008</code>.
+
This page describes accounting for I/O activity of CT processes, on a VFS I/O level.
 +
The feature is available beginning with OpenVZ kernels version <code>028test008</code>.
 +
 
 +
If you are interested in [[I/O priorities for containers]], it is described in a separate article.
 +
 
 +
If you are looking for IO scheduler (i.e. lower level) statistics, see [[IO statistics]] instead.
  
 
== New resources ==
 
== New resources ==
 
The following resources are accounted:
 
The following resources are accounted:
 
; read bytes
 
; read bytes
: is the amount of bytes read by tasks. Reads are always synchronous in kernel so this type of resources is the easiest one.
+
: is the number of bytes read by tasks. Reads are always synchronous in the kernel so this type of resource is the easiest one.
  
 
; dirty bytes
 
; dirty bytes
: is the amount of bytes that were dirty since VE start, i.&nbsp;e. data that isn't yet flushed to disk. This type of resources is accounted using [[RSS_fractions_accounting|page beancounters]] and the context the page was dirtied by is determined like this:
+
: is the number of bytes that were dirty since VE start, i.&nbsp;e. data that isn't yet flushed to disk. This type of resource is accumulated using [[RSS_fractions_accounting|page beancounters]] and the context the page was dirtied by is determined like this:
 
:* if a page is mapped it's "mapper" is used as this may happen in any context (unmapping of a page under memory pressure);
 
:* if a page is mapped it's "mapper" is used as this may happen in any context (unmapping of a page under memory pressure);
 
:* if a page is not mapped — current BC is used as this can happen during usual write (writev) only.
 
:* if a page is not mapped — current BC is used as this can happen during usual write (writev) only.
  
 
; written bytes
 
; written bytes
: is the amount of bytes flushed to disk. The beancounter to charge this to is the one the page was dirtied by.
+
: is the number of bytes flushed to disk. The beancounter charged with this is the one by which the page was dirtied.
  
 
; canceled bytes
 
; canceled bytes
: is the amount of bytes that were dirty but weren't flushed to disk.
+
: is the number of bytes that were dirty but weren't flushed to disk.
  
 
; missed bytes
 
; missed bytes
: is the amount of bytes that were dirtied but the context (beancounter) wasn't saved due to lack of memory.
+
: is the number of bytes that were dirtied but the context (beancounter) wasn't saved due to lack of memory.
  
 
== Proc interface ==
 
== Proc interface ==
 +
 
=== General information ===
 
=== General information ===
As it was described in [[BC proc entries|another article]] each beancounter has it's own <code>/proc/bc/$BCID</code> directory where subsystems add their entries. I/O accounting adds <code>ioacct</code> entry to show I/O information. This entry contains the following information:
+
As was described in [[BC proc entries|another article]] each beancounter has its own <code>/proc/bc/$BCID</code> directory where subsystems add their entries. I/O accounting adds an <code>ioacct</code> entry to show I/O information. This entry contains the following information:
 
<pre>
 
<pre>
 
# cat /proc/bc/101/ioacct  
 
# cat /proc/bc/101/ioacct  
Line 33: Line 39:
 
</pre>
 
</pre>
  
Note that <code>dirty</code> is not the size of a dirty data at the moment, but the size of a dirty data seen so far.
+
Note that <code>dirty</code> is not the size of dirty data at the moment, but the size of dirty data seen so far.
  
 
=== Debugging information ===
 
=== Debugging information ===
When <code>CONFIG_UBC_DEBUG_IO</code>, is on <code>/proc/bc/ioacct_debug</code> is added. This entry contains a snapshot of current dirty pages with its beancounter in system. For example:
+
When <code>CONFIG_UBC_DEBUG_IO</code>, is on <code>/proc/bc/ioacct_debug</code> is added. This entry contains a snapshot of current dirty pages with its beancounter. For example:
 
<pre>
 
<pre>
 
# cat /proc/bc/ioacct_debug  
 
# cat /proc/bc/ioacct_debug  
Line 102: Line 108:
 
</pre>
 
</pre>
  
[[Category:UBC]]
+
== See also ==
 +
* [[IO statistics]]
 +
* [[I/O priorities]]
 +
* [[I/O limits]]
 +
 
 +
[[Category:Resource management]]

Latest revision as of 19:46, 8 July 2015

This page describes accounting for I/O activity of CT processes, on a VFS I/O level. The feature is available beginning with OpenVZ kernels version 028test008.

If you are interested in I/O priorities for containers, it is described in a separate article.

If you are looking for IO scheduler (i.e. lower level) statistics, see IO statistics instead.

New resources[edit]

The following resources are accounted:

read bytes
is the number of bytes read by tasks. Reads are always synchronous in the kernel so this type of resource is the easiest one.
dirty bytes
is the number of bytes that were dirty since VE start, i. e. data that isn't yet flushed to disk. This type of resource is accumulated using page beancounters and the context the page was dirtied by is determined like this:
  • if a page is mapped it's "mapper" is used as this may happen in any context (unmapping of a page under memory pressure);
  • if a page is not mapped — current BC is used as this can happen during usual write (writev) only.
written bytes
is the number of bytes flushed to disk. The beancounter charged with this is the one by which the page was dirtied.
canceled bytes
is the number of bytes that were dirty but weren't flushed to disk.
missed bytes
is the number of bytes that were dirtied but the context (beancounter) wasn't saved due to lack of memory.

Proc interface[edit]

General information[edit]

As was described in another article each beancounter has its own /proc/bc/$BCID directory where subsystems add their entries. I/O accounting adds an ioacct entry to show I/O information. This entry contains the following information:

# cat /proc/bc/101/ioacct 
        read                              24330240
        write                               598016
        dirty                               622592
        cancel                               24576
        missed                                   0
        ...

Note that dirty is not the size of dirty data at the moment, but the size of dirty data seen so far.

Debugging information[edit]

When CONFIG_UBC_DEBUG_IO, is on /proc/bc/ioacct_debug is added. This entry contains a snapshot of current dirty pages with its beancounter. For example:

# cat /proc/bc/ioacct_debug 
Races: io 0 anon 0 clean 0 missed 0
pb         page     flg       cnt     mcnt pb_list  page_pb    mapping  ub      
f7a4a520 e c17cfc68 Dawl        2        0 00000000 f7a4a521   c3870168 0
f7a15ce0 e c17d034c Dawl        2        0 00000000 f7a15ce1   c3870168 0
f72e4680 e c1083364 Dawl        2        0 00000000 f72e4681   c3870168 0
f72e4800 e c1083388 Dawl        2        0 00000000 f72e4801   c3870168 0
f7a15dc0 e c17d0010 Dawl        2        0 00000000 f7a15dc1   c3870168 0

Auxiliary information[edit]

Along with VFS I/O activity the following information is gathered

sync counts
The number of sync(2), fsync(2), fdatasync(2) and sync_file_range.
# cat /proc/bc/101/ioacct 
        ...
        syncs_total                              0
        fsyncs_total                             0
        fdatasyncs_total                        10
        range_syncs_total                        0
        syncs_active                             0
        fsyncs_active                            0
        fdatasyncs_active                        0
        range_syncs_active                       0

_active suffix refers to the number of operations in progress.

write/read calls counts
The number of read(2), readv(2), write(2), writev(2), etc. and the number of bytes passed.
# cat /proc/bc/101/ioacct 
        ...
        vfs_reads                            24491
        vfs_read_chars                     2616512
        vfs_writes                             380
        vfs_write_chars                30064899102
number of page beancounters pinned by I/O
This is the number of page beancounters that save information about page dirtier. This is actually the number of dirty pages within beancounter at the moment.
# cat /proc/bc/0/ioacct
        ...
        write                               598016
        dirty                               622592
        ...
        io_pbs                                   0
# dd if=/dev/zero of=tmp bs=512 count=40
# cat /proc/bc/0/ioacct 
        ...
        write                               598016
        dirty                               643072
        ...
        io_pbs                                   5
# sync
# cat /proc/bc/0/ioacct
        ...
        write                               618496
        dirty                               643072
        ...
        io_pbs                                   0

See also[edit]