Difference between revisions of "Disk quota, df and stat weird behaviour"

From OpenVZ Virtuozzo Containers Wiki
Jump to: navigation, search
m (categorized)
(VE -> container, some rewording and reformatting, categorization)
Line 1: Line 1:
The aim of this article is to understand where the numbers that are shown by <code>stat</code>/<code>df</code> utils in [[VE]] come from.  
+
The aim of this article is to understand where the numbers that are shown by <code>stat</code>/<code>df</code> utils in [[container]] come from.
  
== Conventions and Notations ==
+
== Conventions and notations ==
Consider typical OpenVZ setup, where <code>ext2</code> separate filesystem is mounted on <code>/vz</code>. <code>ext2</code> is called ''underlying filesystem'' in such situation.
 
  
Linux VFS design allows every filesystem to export to userspace the following information concerning disk space (here and further I use subscript to specify paricular filesystem type):
+
Consider typical OpenVZ setup, where <code>ext2</code> separate file system is mounted on <code>/vz</code>. <code>ext2</code> is called ''underlying file system'' in such situation.
 +
 
 +
Linux VFS design allows every file system to export to user space the following information concerning disk space (here and further <math>{subscript}</math> is used to specify particular file system type):
  
 
* <math>total_{ext2}</math> - total amount of disk space that potentially can be acquired (e.g. HDD capacity)
 
* <math>total_{ext2}</math> - total amount of disk space that potentially can be acquired (e.g. HDD capacity)
Line 10: Line 11:
 
* <math>avail_{ext2}</math> - amount of disk space that is still available for non-root users
 
* <math>avail_{ext2}</math> - amount of disk space that is still available for non-root users
  
Note, that not all free blocks can be used by non-root users: some amount of disk space is reserved for root. For example on <code>ext2</code> filesystem only root can use last free 5 percent (by default)  of disk space. This is the difference between <math>avail_{ext2}</math> and <math>free_{ext2}</math>. Also mark, that the following inequality is always true:
+
Note that not all free blocks can be used by non-root users: some amount of disk space is reserved for root. For example on <code>ext2</code> file system only root can use last free 5 percent (by default)  of disk space. This is the difference between <math>avail_{ext2}</math> and <math>free_{ext2}</math>. Also note, that the following inequality is always true:
 
: <math>avail_{ext2} \le free_{ext2}</math> (1)
 
: <math>avail_{ext2} \le free_{ext2}</math> (1)
  
Inside [[VE]] special filesystem type is used: <code>simfs</code>. This filesystem allows to isolate particular [[VE]] from other [[VE]]s. Hence, when <code>df</code> or <code>stat</code> utils are invoked they get information from <code>simfs</code>, which exports to them the following values (by analogy with <code>ext2</code>):
+
Inside a [[container]], a special file system type is used, called <code>simfs</code>. This file system allows to isolate a particular [[CT]] from other CTs. Hence, when <code>df</code> or <code>stat</code> utilities are invoked, they get information from <code>simfs</code>, which exports the following values (by analogy with <code>ext2</code>):
  
 
* <math>total_{simfs}</math>
 
* <math>total_{simfs}</math>
Line 19: Line 20:
 
* <math>avail_{simfs}</math>
 
* <math>avail_{simfs}</math>
  
This article is in fact devoted to how simfs filesystem calculates the values above.  
+
This article is in fact devoted to how simfs file system calculates the values above.  
  
To produce any calculations input data are required. What are input data for <code>simfs</code>? Except already discussed information from underlying filesystem (<math>total_{ext2}</math>, <math>total_{ext2}</math>, <math>total_{ext2}</math>) one more element comes into force in OpenVZ environment. It is OpenVZ ''per-VE disk quotas''. The values that provide this element are:
+
To produce any calculations, input data are required. What are input data for <code>simfs</code>? Aside from already mentioned information from underlying file system (<math>total_{ext2}</math>, <math>total_{ext2}</math>, <math>total_{ext2}</math>), one more element comes into force in OpenVZ environment. It is OpenVZ ''per-container disk quotas''. The values that provide this element are:
  
* <math>quota_{used}</math> - the number of blocks currently used by [[VE]]
+
* <math>quota_{used}</math> - the number of blocks currently used by a [[CT]]
* <math>quota_{barrier}</math> - the number of blocks this [[VE]] potentially can obtain
+
* <math>quota_{barrier}</math> - the number of blocks this CT can potentially obtain
  
OpenVZ disk quota counts the number of blocks currently used by VE and prevents this number to be greater than the limit/barrier set.
+
OpenVZ disk quota counts the number of blocks currently used by a CT and prevents this number to be greater than the limit/barrier set.
  
<!-- TODO: Uncoment after adding examples.
+
<!-- TODO: Uncomment after adding examples.
 
First let's use
 
First let's use
 
<pre>
 
<pre>
Line 37: Line 38:
  
 
== Cases ==
 
== Cases ==
Consider three basic scenarios, that are possible.
+
 
=== '''Quota is off for VE''' ===
+
Consider three basic possible scenarios.
: If quota is off for [[VE]] (DISK_QUOTA=no), the total amount of space, that [[VE]] potentially can acquire, equals amount of total space on partition. Certainly some space can be used by other [[VE]]s, but potentially [[VE]] can have all the space on device. Number of free blocks for [[VE]] equals number of free blocks on partition. Note, that it implies that [[VE]] root user, can fill all the space, including the space, that is reserved for root user of [[HN]]. This is why, you shouldn't reside [[VE]]s private areas on root filesystem of your [[HN]]. Amount of available disk space for [[VE]] equals the number of available blocks for underlying filsystem. Thus, we have the following relationships:
+
 
 +
=== Quota is off for CT ===
 +
: If quota is off for a CT (<code>DISK_QUOTA=no</code>), the total amount of space that this CT potentially can acquire equals the amount of total space on a partition. Certainly some space can be used by other CTs, but potentially a CT can have all the space on device. So, the number of free blocks for the CT equals the number of free blocks on partition. Note it implies that a CT root user can fill in all the space, including the space reserved for root user of the [[host system]]. This is why one should not reside CTs private areas on root file system of your host system. The amount of available disk space for CT equals the number of available blocks for the underlying file system. Thus, we have the following relationships:
 +
 
 
:: <math>total_{simfs}</math> = <math>total_{ext2}</math>
 
:: <math>total_{simfs}</math> = <math>total_{ext2}</math>
 
:: <math>free_{simfs}</math> = <math>free_{ext2}</math>
 
:: <math>free_{simfs}</math> = <math>free_{ext2}</math>
 
:: <math>avail_{simfs} = avail_{ext2}</math>
 
:: <math>avail_{simfs} = avail_{ext2}</math>
: Rather valuable disadvantage of swithching OpenVZ quota off (besides having unlimited [[VE]]s!) is that you will not be able to get information about how much disk space is used by [[VE]] (without doing possibly long term <code>du</code> command) using <code>df</code>/<code>stat</code>. I mean, that
 
:: <math>df_{usage} = total_{simfs} - free_{simfs} = total_{ext2} - free_{ext2}</math>
 
: thus in [[VE]] you obtain information about disk usage of partition, but not disk usage of [[VE]].
 
  
=== '''Quota is on for VE and there is enough space on partition''' ( [[:Image:Vzquota1.png|illustration 1]]) ===
+
: Rather valuable disadvantage of switching off OpenVZ quota (besides having unlimited CTs) is that you will not be able to get information about how much disk space is used by a CT (without doing possibly long term <code>du</code> command) using <code>df</code>/<code>stat</code>. I mean, that
: If quota is on, amount of disk space that [[VE]] potentially can acquire should be equal quota barrier:  
+
:: <math>df_{usage} = total_{simfs} - free_{simfs} = total_{ext2} - free_{ext2},</math>
 +
: thus in the CT you obtain information about disk usage of partition, but not disk usage of the CT.
 +
 
 +
=== Quota is on for CT, and there is enough space on partition ===
 +
: [[Image:Vzquota1.png|thumb|left]]
 +
: If disk quota is on, the amount of disk space that a CT can potentially acquire should be equal to the quota barrier:  
 
:: <math>total_{simfs} = quota_{barrier}</math>
 
:: <math>total_{simfs} = quota_{barrier}</math>
: Amount of free space in this case should logically be the following:
+
: The amount of free space in this case should logically be:
 
:: <math>free_{simfs} = quota_{barrier} - quota_{used}</math>
 
:: <math>free_{simfs} = quota_{barrier} - quota_{used}</math>
: However here is a pitfall. Suppose that the amount of free disk space actually on underlying filesystem is less than it is estimated from quota using the formule above, i.e.:
+
: However here is a pitfall. Suppose that the amount of free disk space on the underlying filesystem is less than it is estimated from quota using the formula above, i.e.:
 
:: <math> free_{ext2} < quota_{barrier} - quota_{used} </math>
 
:: <math> free_{ext2} < quota_{barrier} - quota_{used} </math>
:  Then, definitely, amount of free disk space reported by <code>simfs</code> should be other!  This situation will be considered in the next point and in this point we assume that there is enough space on partition, i.e
+
:  Then, definitely, the amount of free disk space reported by <code>simfs</code> should be different. This situation will be examined later; here we assume that there is enough space on partition, i.e
 
:: <math> free_{ext2} \ge quota_{barrier} - quota_{used} </math> (2)
 
:: <math> free_{ext2} \ge quota_{barrier} - quota_{used} </math> (2)
: As concerns amount of disk space available for non-root users, if there is enough disk space:
+
: As for amount of disk space available for non-root users, if there is enough disk space:
 
:: <math>avail_{ext2} \ge quota_{barrier} - quota_{used}</math>
 
:: <math>avail_{ext2} \ge quota_{barrier} - quota_{used}</math>
: then amount of disk space available for non-root users in [[VE]] equals free space estimated from quota:
+
: then amount of disk space available for non-root users in a CT equals the free space estimated from quota:
 
:: <math>free_{simfs} = quota_{barrier} - quota_{used}</math>
 
:: <math>free_{simfs} = quota_{barrier} - quota_{used}</math>
 +
{{Clear}}
  
=== '''Quota is on for VE and there is NOT enough space on partition''' ([[:Image:Vzquota2.png|illustration 2]], [[:Image:Vzquota3.png|illustration 3]], [[:Image:Vzquota4.png|illustration 4]], [[:Image:Vzquota5.png|illustration 5]]) ===
+
=== Quota is on for CT and there is NOT enough space on partition ===
: This is the most interesting and difficult to explain case. Nevertheless I tried to do it. So, our assumption is that:
+
{|
:: <math>quota_{barrier} - quota_{used} > free_{ext2}</math>
+
[[Image:Vzquota2.png|thumb|left]]
: What should be reported as free space in such case? Of course, <math>free_{ext2}</math>! This is the actual amount of space that can be used by [[VE]]. Hence:
+
[[Image:Vzquota3.png|thumb|left]]
:: <math>free_{simfs} = free_{ext2}</math>
+
[[Image:Vzquota4.png|thumb|left]]
: And now consider the following situation. There is two [[VE]]s. One of [[VE]]s writes nothing to disk. Second [[VE]] writes to disc some information. Administrator of [[VE]] #1 looks at <code>df</code> output. He observes the "Usage" column. What does she see?
+
[[Image:Vzquota5.png|thumb|left]]
 +
||
 +
This is the most interesting and difficult to explain case. Nevertheless I tried to do it. So, our assumption is that:
 +
: <math>quota_{barrier} - quota_{used} > free_{ext2}</math>
 +
What should be reported as free space in such case? Of course, <math>free_{ext2}</math>! This is the actual amount of space that can be used by a CT. Hence:
 +
: <math>free_{simfs} = free_{ext2}</math>
 +
Now consider the following situation. There are two containers. First CT writes nothing to disk. Second CT writes something to the disk. An administrator of CT #1 looks at <code>df</code> output, noting the "Usage" column. What does she see?
 
:: <math>df_{usage} = total_{simfs} - free_{simfs} = total_{simfs} - free_{ext2}</math> (3)
 
:: <math>df_{usage} = total_{simfs} - free_{simfs} = total_{simfs} - free_{ext2}</math> (3)
: <math>free_{ext2}</math> decreases because [[VE]] #2 writes to disc, consequently <math>df_{usage}</math> increases! "What the hell is going on?!" - thinks the administrator - "Nobody writes on the disk in my [[VE]], but the usage increases!" To avoid such situation the following approach is used in OpenVZ: decrease <math>total_{simfs}</math> so, that <math>df_{usage}</math> remains the same, i.e.:
+
<math>free_{ext2}</math> decreases because CT #2 writes to disk, consequently <math>df_{usage}</math> increases! “What the hell is going on?!”, — thinks the administrator: “Nobody writes to the disk [in my container], but the usage increases”! To avoid such a situation, the following approach is used in OpenVZ: decrease <math>total_{simfs}</math>, so that <math>df_{usage}</math> remains the same, i.e.:
:: <math>total_{simfs} = quota_{usage} + free_{ext2}</math> (4)
+
: <math>total_{simfs} = quota_{usage} + free_{ext2}</math> (4)
: Substituting (4) in (3) obtain:
+
By substituting (4) to (3), we get:
:: <math>df_{usage} = total_{simfs} - free_{simfs} = quota_{usage} + free_{ext2} - free_{ext2} = quota_{usage} = const</math>
+
: <math>df_{usage} = total_{simfs} - free_{simfs} = quota_{usage} + free_{ext2} - free_{ext2} = quota_{usage} = const</math>
In this case, administrator of [[VE]] #1 sees that total amount of space decreases, but usage however is constant.
+
In this case, administrator of CT #1 sees that total amount of space decreases, but usage however is constant.
: The same reasoning as with <math>free_{simfs}</math> suits for calculating <math>avail_simfs</math>. Two cases are possible. If
+
 
:: <math>avail_{ext2} \ge quota_{barrier} - quota_{used}</math>
+
The same reason as with <math>free_{simfs}</math> fits for calculating <math>avail_simfs</math>. Two cases are possible. If
: then  
+
: <math>avail_{ext2} \ge quota_{barrier} - quota_{used}</math>
:: <math>avail_{simfs} = free_{ext2}</math>
+
then  
: and if
+
: <math>avail_{simfs} = free_{ext2}</math>
:: <math>avail_{ext2} < quota_{barrier} - quota_{used}</math>
+
and if
: then
+
: <math>avail_{ext2} < quota_{barrier} - quota_{used}</math>
:: <math>avail_{simfs} = quota_{barrier} - quota_{used}</math>
+
then
 +
: <math>avail_{simfs} = quota_{barrier} - quota_{used}</math>
  
 
The table below summarizes all possible cases.
 
The table below summarizes all possible cases.
 +
|}
 +
{{Clear}}
  
 
==  Cases Conclusion ==
 
==  Cases Conclusion ==
So we have three basic variants. Variant number one is not good, because [[VE]] administrator can't get information about [[VE]] disk usage and [[HN]] administrator can't limit [[VE]] disk usage. Variant three is not good 'cause we have some weird (but logical) values in <code>df</code>/<code>stat</code> output in [[VE]], e.g. total disk space can decrease.  Variant two is perfect. How can we provide this varaint always take place? Here is the simple rule:
+
So there are three basic variants. Variant number one is not good, because a container's administrator can not get information about CT disk usage and the [[host system]] administrator can't limit CT disk usage. Variant three is not good because we have some weird (but logical) values in <code>df</code>/<code>stat</code> output in CT, e.g. total disk space can decrease.  Variant two is perfect. How can we make sure that this variant always take place? Here is the simple rule:
  
{{Out|Do not set random disk quota barrier/limit!}}
+
{{Warning|Do not set random disk quota barrier/limit!}}
 +
 
 +
Even if you want a container to be unlimited, consider reasonable values. Use the following formula:
  
Even if you want [[VE]] to be unlimited, consider reasonable values. Use the following formula:
 
 
:: <math>\sum_{i=1}^Nq_i \le S - s</math> (5)
 
:: <math>\sum_{i=1}^Nq_i \le S - s</math> (5)
<math>q_i</math> - quota barrier for [[VE]] <math>i</math>
 
 
<math>S</math> - total amount of space on underlying filesystem
 
  
<math>s</math> - amount of space used by not [[VE]]s private area: templates, locks, etc.
+
Here <math>q_i</math> is quota barrier for CT<math>i</math>,<br/>
 +
<math>S</math> — total amount of space on underlying file system<br/>
 +
<math>s</math> amount of space used by everything else than CT private area: templates, locks, etc.
  
Note, that if you install template - you decrease <math>s</math>. This is bad, because, ideally, after each template
+
Note that if you install a template, you decrease <math>s</math>. This is bad because, ideally, after each template
installation you have to check inequality (5). To avoid this I suggest to mount separate partion on /vz/private, rather than
+
installation you have to check inequality (5). To avoid this I suggest to mount separate partition on <code>/vz/private</code>, rather than on <code>/vz</code>. In such case <math>s</math> always equals <math>0</math>.
on /vz/. In such case <math>s</math> always equals <math>0</math>.
 
  
 
== Cases Summarizing Table ==
 
== Cases Summarizing Table ==
{| border="1" cellpadding="5" cellspacing="0" align="center"
+
{| class="wikitable" align="center"
 
| colspan="2" | Quota off
 
| colspan="2" | Quota off
 
| <math>total_{simfs} = total_{ext2}</math>
 
| <math>total_{simfs} = total_{ext2}</math>
Line 137: Line 152:
  
 
== Other reasons of strange numbers ==
 
== Other reasons of strange numbers ==
At the moment I see only two more reasons, why numbers in <code>df</code>/<code>stat</code> output can confuse you.
+
At the moment I see only two more reasons why numbers in <code>df</code>/<code>stat</code> output can confuse you.
* The quota is inconsistent. This can happen if you turned quota off for some time, if you wrote directly to private area (<code>/vz/private</code>), but not through <code>simfs</code>, etc. When you have doubts whether your quota is consistent or not, just drop quota (<code>vzquota drop <veid></code>, where <code><veid></code> is the id of stopped [[VE]]). While starting [[VE]] <code>vzctl</code> will automatically initalize quota.
+
# The quota is inconsistent. This can happen if you turned quota off for some time, if you wrote directly to private area (<code>/vz/private</code>), but not through <code>simfs</code>, etc. When you have doubts whether your quota is consistent or not, just drop quota (<code>vzquota drop <ctid></code>, where <code><ctid></code> is the id of a stopped [[CT]]). While starting [[CT]], <code>vzctl</code> will automatically initialize quota.
* Unsupported underlying filesystem. Currently OpenVZ quota only supports <code>ext2</code> and <code>ext3</code>. With other filesystem types you can have unpredictable results. Praemonitus praemunitus!
+
# Unsupported underlying filesystem. Currently OpenVZ quota only supports <code>ext2</code> and <code>ext3</code>. With other file system types you can have unpredictable results. Praemonitus praemunitus!
  
 
== TODO ==
 
== TODO ==
Line 146: Line 161:
 
[[Category: Troubleshooting]]
 
[[Category: Troubleshooting]]
 
[[Category: Resource management]]
 
[[Category: Resource management]]
 +
[[Category: Disk quota]]

Revision as of 08:53, 24 January 2008

The aim of this article is to understand where the numbers that are shown by stat/df utils in container come from.

Conventions and notations

Consider typical OpenVZ setup, where ext2 separate file system is mounted on /vz. ext2 is called underlying file system in such situation.

Linux VFS design allows every file system to export to user space the following information concerning disk space (here and further is used to specify particular file system type):

  • - total amount of disk space that potentially can be acquired (e.g. HDD capacity)
  • - amount of disk space that is still free
  • - amount of disk space that is still available for non-root users

Note that not all free blocks can be used by non-root users: some amount of disk space is reserved for root. For example on ext2 file system only root can use last free 5 percent (by default) of disk space. This is the difference between and . Also note, that the following inequality is always true:

(1)

Inside a container, a special file system type is used, called simfs. This file system allows to isolate a particular CT from other CTs. Hence, when df or stat utilities are invoked, they get information from simfs, which exports the following values (by analogy with ext2):

This article is in fact devoted to how simfs file system calculates the values above.

To produce any calculations, input data are required. What are input data for simfs? Aside from already mentioned information from underlying file system (, , ), one more element comes into force in OpenVZ environment. It is OpenVZ per-container disk quotas. The values that provide this element are:

  • - the number of blocks currently used by a CT
  • - the number of blocks this CT can potentially obtain

OpenVZ disk quota counts the number of blocks currently used by a CT and prevents this number to be greater than the limit/barrier set.


Cases

Consider three basic possible scenarios.

Quota is off for CT

If quota is off for a CT (DISK_QUOTA=no), the total amount of space that this CT potentially can acquire equals the amount of total space on a partition. Certainly some space can be used by other CTs, but potentially a CT can have all the space on device. So, the number of free blocks for the CT equals the number of free blocks on partition. Note it implies that a CT root user can fill in all the space, including the space reserved for root user of the host system. This is why one should not reside CTs private areas on root file system of your host system. The amount of available disk space for CT equals the number of available blocks for the underlying file system. Thus, we have the following relationships:
=
=
Rather valuable disadvantage of switching off OpenVZ quota (besides having unlimited CTs) is that you will not be able to get information about how much disk space is used by a CT (without doing possibly long term du command) using df/stat. I mean, that
thus in the CT you obtain information about disk usage of partition, but not disk usage of the CT.

Quota is on for CT, and there is enough space on partition

Vzquota1.png
If disk quota is on, the amount of disk space that a CT can potentially acquire should be equal to the quota barrier:
The amount of free space in this case should logically be:
However here is a pitfall. Suppose that the amount of free disk space on the underlying filesystem is less than it is estimated from quota using the formula above, i.e.:
Then, definitely, the amount of free disk space reported by simfs should be different. This situation will be examined later; here we assume that there is enough space on partition, i.e
(2)
As for amount of disk space available for non-root users, if there is enough disk space:
then amount of disk space available for non-root users in a CT equals the free space estimated from quota:

Quota is on for CT and there is NOT enough space on partition

Vzquota2.png
Vzquota3.png
Vzquota4.png
Vzquota5.png

This is the most interesting and difficult to explain case. Nevertheless I tried to do it. So, our assumption is that:

What should be reported as free space in such case? Of course, ! This is the actual amount of space that can be used by a CT. Hence:

Now consider the following situation. There are two containers. First CT writes nothing to disk. Second CT writes something to the disk. An administrator of CT #1 looks at df output, noting the "Usage" column. What does she see?

(3)

decreases because CT #2 writes to disk, consequently increases! “What the hell is going on?!”, — thinks the administrator: “Nobody writes to the disk [in my container], but the usage increases”! To avoid such a situation, the following approach is used in OpenVZ: decrease , so that remains the same, i.e.:

(4)

By substituting (4) to (3), we get:

In this case, administrator of CT #1 sees that total amount of space decreases, but usage however is constant.

The same reason as with fits for calculating . Two cases are possible. If

then

and if

then

The table below summarizes all possible cases.

Cases Conclusion

So there are three basic variants. Variant number one is not good, because a container's administrator can not get information about CT disk usage and the host system administrator can't limit CT disk usage. Variant three is not good because we have some weird (but logical) values in df/stat output in CT, e.g. total disk space can decrease. Variant two is perfect. How can we make sure that this variant always take place? Here is the simple rule:

Warning.svg Warning: Do not set random disk quota barrier/limit!

Even if you want a container to be unlimited, consider reasonable values. Use the following formula:

(5)

Here is quota barrier for CT,
— total amount of space on underlying file system
— amount of space used by everything else than CT private area: templates, locks, etc.

Note that if you install a template, you decrease . This is bad because, ideally, after each template installation you have to check inequality (5). To avoid this I suggest to mount separate partition on /vz/private, rather than on /vz. In such case always equals .

Cases Summarizing Table

Quota off

Quota on

Other reasons of strange numbers

At the moment I see only two more reasons why numbers in df/stat output can confuse you.

  1. The quota is inconsistent. This can happen if you turned quota off for some time, if you wrote directly to private area (/vz/private), but not through simfs, etc. When you have doubts whether your quota is consistent or not, just drop quota (vzquota drop <ctid>, where <ctid> is the id of a stopped CT). While starting CT, vzctl will automatically initialize quota.
  2. Unsupported underlying filesystem. Currently OpenVZ quota only supports ext2 and ext3. With other file system types you can have unpredictable results. Praemonitus praemunitus!

TODO

TODO: Add examples with stat/df