Open main menu

OpenVZ Virtuozzo Containers Wiki β

Changes

Features

498 bytes added, 15:32, 24 January 2008
VE -> container, some rewording, added info about I/O scheduler
{{stub}} The architecture of OpenVZ [[VE]]s is different from the traditional virtual machines architecture because it always runs the same OS kernel as the host system (while still allowing multiple Linux distributions in individual [[VEcontainer]]s). This single-kernel implementation technology enables running [[Virtual Environmentcontainer]]s with a near-zero overhead. Thus, OpenVZ offer an order of magnitude higher efficiency and manageability than traditional virtualization technologies.
== OS Virtualization ==
From the point of view of applications and [[Virtual Environmentcontainer]] users, each VE is an independent system. This independency independence is provided by a virtualization layer in the kernel of the host OS. Note that only a negligible part of the CPU resources is spent on virtualization (around 1-2%). The main features of the virtualization layer implemented in OpenVZ are the following:
* A [[VEcontainer]] looks and behaves like a regular Linux system. It has standard startup scripts; software from vendors can run inside a VE container without OpenVZ-specific modifications or adjustment;
* A user can change any configuration file and install additional software;
* [[Virtual EnvironmentContainers]]s are completely isolated from each other (file system, processes, Inter Process Communication (IPC), sysctl variables);* Processes belonging to a [[VE]] container are scheduled for execution on all available CPUs. Consequently, VEs [[CT]]s are not bound to only one CPU and can use all available CPU power.
== Network virtualization ==
The OpenVZ network virtualization layer is designed to isolate VEs [[CT]]s from each other and from the physical network:
* Each VE [[CT]] has its own IP address; multiple IP addresses per VE CT are allowed;* Network traffic of a VE CT is isolated from the other VEsCTs. In other words, Virtual Environments containers are protected from each other in the way that makes traffic snooping impossible;* Firewalling may be used inside a VE CT (the user can create rules limiting access to some services using the canonical iptables tool inside the VEa CT). In other words, it is possible to set up firewall rules from inside a VECT;* Routing table manipulations and advanced routing features are supported for individual VEscontainers. For example, setting different maximum transmission units (MTUs) for different destinations, specifying different source addresses for different destinations, and so on.
== Resource Management ==
OpenVZ [[resource management]] controls the amount of resources available for Virtual Environmentscontainers. The controlled resources include such parameters as CPU power, disk space, a set of memory-related parameters, etc. Resource management allows OpenVZ to:
* Effectively share available Hardware Node [[host system]] resources among VEsCTs
* Guarantee Quality-of-Service (QoS)
* Provide performance and resource isolation and protect from denial-of-service attacks
Resource management is much more important for OpenVZ than for a standalone computer since computer resource utilization in a OpenVZ-based system is considerably higher than that in a typical system.
As all the VEs CTs are using the same kernel, resource management is of paramount importance. Really, each VE CT should stay within its boundaries and not affect other VEs CTs in any way — and this is what resource management does.
OpenVZ resource management consists of three four main components: two-level disk quota, fair CPU scheduler, disk I/O scheduler, and user beancounters. Please note that all those resources can be changed during VE CT runtime, there is no need to reboot. Say, if you want to give your VE CT less memory, you just change the appropriate parameters on the fly. This is either very hard to do or not possible at all with other virtualization approaches such as VM or hypervisor.
====Two-Level Disk Quota====[[Host system ]] administrator (OpenVZ) owner ([[HW]] root) can set up a per-VE container [[disk quota]]s, in terms of disk blocks and i-nodes inodes (roughly number of files). This is the first level of disk quota. In addition to that, a VE owner container administrator ([[CT]] root) can use employ usual quota tools inside own VE CT to set standard UNIX per-user and per-group [[disk quota]]s.
If you one want to give your VE a CT more disk space, you just increase its disk quota. No need to resize disk partitions etc.
====Fair CPU scheduler====
CPU scheduler in OpenVZ is a two-level implementation of [[fair-share scheduling]] strategy.
On the first level scheduler decides which VE CT is give the CPU time slice to, based on per-VE CT cpuunits values. On the second level the standard Linux scheduler decides which process to run in that VEcontainer, using standard Linux process priorities and such. OpenVZ administrator can set up different values of <code>cpuunits</code> for different containers, and the CPU time will be given to those proportionally. Also there is a way to limit CPU time, e.g. say that this container is limited to, say, 10% of CPU time available. === I/O scheduler ===Similar to the Fair CPU scheduler described above, I/O scheduler in OpenVZ is also two-level, utilizing Jens Axboe's CFQ I/O scheduler on its second level.
OpenVZ administrator can set up different values of cpuunits for different VEsEach container is assigned an I/O priority, and the CPU time will be given I/O scheduler distributes the available I/O bandwidth according to those proportionallythe priorities assigned. Thus no single container can saturate an I/O channel.
Also there is a way to limit CPU time, e.g. say that this VE is limited to, say, 10% of CPU time available.=== User Beancounters ===
====User Beancounters====[[User beancounters]] is a set of per-VE CT counters, limits, and guarantees. There is a set of about 20 parameters which are carefully chosen to cover all the aspects of VE CT operation, so no single VE can abuse any resource which is limited for the whole node and thus do harm to another VEsCTs.
Resources accounted and controlled are mainly memory and various in-kernel objects such as IPC shared memory segments, network buffers etc. etc. Each resource can be seen from <code>/proc/user_beancounters</code> and has five values assiciated with it: current usage, maximum usage (for the lifetime of a VEcontainer), barrier, limit, and fail counter. The meaning of barrier and limit is parameter-dependant; in short, those can be thought of as a soft limit and a hard limit. If any resource hits the limit, fail counter for it is increased, so VE owner CT administrator can see if something bad is happening by analyzing the output of <code>/proc/user_beancounters</code> in her VE.
== Checkpointing and live migration ==
{{Main|Checkpointing and live migration}}
A live migration and checkpointing feature was released for OpenVZ in the middle of April 2006. It allows to migrate a VE from one physical server to another without a need to shutdown/restart a VEcontainer. The process is known as checkpointing: a VE CT is freezed and its whole state is saved to the file on disk. This file can then be transferred to another machine and a VE CT can be unfreezed (restored) there. The delay is about a few seconds, and it is not a downtime, just a delay.
Since every piece of VE the container state, including opened network connections, is saved, from the user's perspective it looks like a delay in response: say, one database transaction takes a longer time than usual, when it continues as normal and user doesn't notice that his database is already running on the another machine.
That feature makes possible scenarios such as upgrading your server without any need to reboot it: if your database needs more memory or CPU resources, you just buy a newer better server and live migrate your VE container to it, then increase its limits. If you want to add more RAM to your server, you migrate all VEs to another one, shut it down, add memory, start it again and migrate all VEs containers back.
[[Category: Concepts]]
[[Category: Technology]]