Disk quota, df and stat weird behaviour

From OpenVZ Virtuozzo Containers Wiki
Revision as of 12:08, 13 November 2006 by Vass (talk | contribs) (Initial edition)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The aim of this article is to understand where the numbers, that are shown by stat/df utils in VE, come from.

Consider typical OpenVZ setup, where ext2 filesystem is mounted on /vz. So underlying filesystem in this situation is ext2.

Linux VFS design allows simfs (root file system type for VE) to get the following information concerning disk space from underlying filesystem:

  • ext2_total - total amount of disk space that potentially can be acquired
  • ext2_free - amount of disk space that is still free
  • ext2_avail - amount of disk space that is still available for non-root users

Note, that not all free blocks can be used by non-root users. By default ext2 sets 5% barrier: 5 percent of total amount of disk space is reserved for super user. This is the difference between ext2_avail and ext2_free. Note, that the following inequality is always true:

ext2_avail ≤ ext2_free (1)

Absolutely the same set of information about disk space simfs should export to user-space if it is asked about it (e.g. stat or df command were invoked):

  • simfs_total
  • simfs_free
  • simfs_avail

In OpenVZ environment one more element is pertinent in the situation we're considering: OpenVZ disk quota. First level of OpenVZ disk quota counts the number of blocks currently used by VE (q_used) and prevents this number to be greater than the limit/barrier set (q_barrier).

Consider three basic scenarios, that are possible:

  • Quota is off for VE
If quota is off for VE (DISK_QUOTA=no), the total amount of space, that VE potentially can acquire, equals amount of total space on partition. Certainly some space can be used by other VEs, but potentially VE can have all the space on device. Number of free blocks for VE equals number of free blocks on partition. Note, that it implies that VE root user, can fill all the space, including the space, that is reserved for root user of HN. This is why, you shouldn't reside VEs private areas on root filesystem of your HN. Amount of available disk space for VE also equals the number of available blocks for underlying filsystem. Thus, we have the following relationships:
simfs_total = ext2_total
simfs_free = ext2_free
simfs_avail = ext2_avail
Rather valueable disadvantage of swithching OpenVZ quota off is that you will not be able to get information about how much disk space is used by VE (without doing possibly long term du command). I mean, that
df_usage = simfs_total - simfs_free = ext2_total - ext2_free
thus in VE you obtain information about disk usage of partition, but not disk usage of VE. By the way, just this number is displayed by df tool in "Usage" column.
  • Quota is on for VE and there is enough space on partition
If quota is on, amount of disk space that VE potentially can acquire should be equal quota barrier:
simfs_total = q_barrier
Amount of free space in this case should logically be the following:
simfs_free = q_barrier - q_used
However here is a pitfall. Suppose that amount of free disk space actually on underlying filesystem is less than it is estimated from quota using the formule above, i.e.:
q_barrier - q_used > ext2_free
Then, definitely, amount of free disk space reported by simfs should be other! This situation will be considered in the next point and in this point we assume that there is enough space on partition, i.e
q_barrier - q_used ≤ ext2_free (2)
As concerns amount of disk space available for non-root users, if there is enough disk space:
q_barrier - q_used ≤ ext2_avail
and this is right in current point due to assumption (2) and the inequality (1), then amount of disk space available for non-root users in VE equals free space estimated from quota:
simfs_free = q_barrier - q_used
  • Quota is on for VE and there is NOT enough space on partition
This is the most interesting and difficult to explain case. Nevertheless I tried to do it. So, our assumption is that:
q_barrier - q_used > ext2_free
What should be reported as free space in such case? Of course, ext2_free! This is the actual amount of space that can be used by VE. Hence:
simfs_free = ext2_free
And now consider the following situation. There is two VEs. One of VEs writes nothing to disk. Second VE writes to disc some information. Administrator of VE #1 looks at df output. He observes the "Usage" column. What does he see?
df_usage = simfs_total - simfs_free = simfs_total - ext2_free (3)
ext2_free decreases because VE #2 writes to disc, consequently df_usage increases! "What the hell is going on?!" - thinks the administrator? - "Nobody writes on the disk in my VE, but the usage increases!" To avoid such situation the following approach is used in OpenVZ disk quota: decrease simfs_total so, that df_usage remains the same, i.e.:
simfs_total = ext2_total - (q_barrier - q_usage) (4)
Substituting (4) in (3) obtain:
df_usage = (ext2_total - ext2_free) - (q_barrier - q_usage) = const
In this case, administrator of VE #1 sees that total amount of space decreases, but usage however is constant and it is good.
The same reasoning as with simfs_free suits for calculating simfs_avail. Two cases are possible. If
q_barrier - q_used ≤ ext2_avail
then
simfs_avail = ext2_free
and if
q_barrier - q_used > ext2_avail
then
simfs_avail = q_barrier - q_used

The table below summarize all possible cases.

We have three variants. Variant number one is not good, because VE administrator can't get information about VE disk usage. Variant three is not good 'cause we have some weird (but logical) values in df/stat output in VE, e.g. total disk space can decrease. Variant two is perfect. How can we provide this varaint always take place? Here is the rule:

  • Do not set random disk quota barrier/limit! Even if you want VE to be unlimited, consider reasonable values. Use the following fromule

- quota barrier for VE i

- total amount of space on underlying filesystem

- amount of space used by not VE private data: templates, locks, etc.



TODO: Add Roma's images

TODO: Add table

TODO: Add examples with stat/df