Difference between revisions of "WP/What are containers"
(more stuff, try to remove VM information and really describe what a container is) |
(moar) |
||
Line 17: | Line 17: | ||
Chroot is still used for application isolation. For example, running ftpd in a chroot to avoid a potential security breach. | Chroot is still used for application isolation. For example, running ftpd in a chroot to avoid a potential security breach. | ||
− | + | Chroot is also used in containers, which have the following consequences: | |
− | + | * there is no need for a separate block device, hard drive partition or filesystem-in-a-file setup | |
+ | * host system administrator can see all the containers' files | ||
+ | * containers backup/restore is trivial | ||
+ | * mass deployment is easy | ||
− | * File system namespace -- this one is chroot() itself. | + | === Other namespaces === |
+ | |||
+ | OpenVZ builds on a chroot idea and expands it to everything else that applications have. In other words, every API that kernel provides to applications are "namespaced", making sure every container have its own isolated subset of a resource. Examples include: | ||
+ | |||
+ | * File system namespace -- this one is chroot() itself, making sure containers can't see each other's files. | ||
* PID namespace, so in every container processes have its own unique process IDs, and the first process inside a container have a PID of 1 (it is usually /sbin/init process which actually relies on its PID to be 1). Containers can only see their own processes, and they can't see (or access in any way, say by sending a signal) processes in other containers. | * PID namespace, so in every container processes have its own unique process IDs, and the first process inside a container have a PID of 1 (it is usually /sbin/init process which actually relies on its PID to be 1). Containers can only see their own processes, and they can't see (or access in any way, say by sending a signal) processes in other containers. | ||
Line 31: | Line 38: | ||
* /proc and /sys namespaces, for every container to have their own representation of /proc and /sys -- special filesystems used to export some kernel information to applications. In a nutshell, those are subsets of what a host system have. | * /proc and /sys namespaces, for every container to have their own representation of /proc and /sys -- special filesystems used to export some kernel information to applications. In a nutshell, those are subsets of what a host system have. | ||
− | * FIXME moar moar | + | * FIXME moar moar moar |
+ | |||
+ | Note that memory and CPU need not be namespaced. Existing virtual memory and multitask mechanisms are already taking care of it. | ||
== Single kernel approach == | == Single kernel approach == | ||
− | + | So, namespaces lets a single kernel run multiple isolated containers. | |
− | + | To say it again, all the containers running on a single piece of hardware share one single Linux kernel. | |
− | + | Yet again, there is only one single OS kernel running, and on top of that there are multiple isolated instances of user-space programs. | |
Single kernel approach is much more light-weight than traditional VM-style virtualization. The consequences are: | Single kernel approach is much more light-weight than traditional VM-style virtualization. The consequences are: | ||
Line 44: | Line 53: | ||
# Software stack that lies in between an application and the hardware is much thinner, this means higher performance of containers (compared to VMs) | # Software stack that lies in between an application and the hardware is much thinner, this means higher performance of containers (compared to VMs) | ||
− | == | + | == Resource management == |
+ | |||
+ | Due to a single kernel model used, all containers share the same set of resources: CPU, memory, disk and network. | ||
+ | |||
+ | Every container can use all of the available hardware resources if configured so. From the other side, containers | ||
+ | should not step on each other's toes, so all the resources are accounted for and controlled by the kernel. | ||
− | + | '''FIXME link to resource management whitepaper goes here''' | |
− | == | + | == Live migration == |
− | + | = Various = | |
+ | == Containers overhead == | ||
− | + | OpenVZ works almost as fast as a usual Linux system. The only overhead is for networking and additional resource management (see below), and in most cases it is negligible. | |
− | |||
− | |||
− | |||
== OpenVZ host system scope == | == OpenVZ host system scope == | ||
Line 63: | Line 75: | ||
== Resource control == | == Resource control == | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Networking (routed/bridged) == | == Networking (routed/bridged) == |
Revision as of 17:38, 15 March 2011
OpenVZ Linux Containers technology whitepaper
OpenVZ is a virtualization technology for Linux, which lets one to partition a single physical Linux machine into multiple smaller units called containers. Technically, it consists of three major things:
- Namespaces
- Resource management
- Checkpointing
Namespaces
A namespace is a feature to limit the scope of something. Here, namespaces are used as containers building blocks. A simple care of a namespace is chroot.
Chroot
Traditional UNIX chroot()
system call is used to change the root of the file system of a calling process to a particular directory. That way it limits the scope of file system for the process, so it can only see and access a limited sub tree of files and directories.
Chroot is still used for application isolation. For example, running ftpd in a chroot to avoid a potential security breach.
Chroot is also used in containers, which have the following consequences:
- there is no need for a separate block device, hard drive partition or filesystem-in-a-file setup
- host system administrator can see all the containers' files
- containers backup/restore is trivial
- mass deployment is easy
Other namespaces
OpenVZ builds on a chroot idea and expands it to everything else that applications have. In other words, every API that kernel provides to applications are "namespaced", making sure every container have its own isolated subset of a resource. Examples include:
- File system namespace -- this one is chroot() itself, making sure containers can't see each other's files.
- PID namespace, so in every container processes have its own unique process IDs, and the first process inside a container have a PID of 1 (it is usually /sbin/init process which actually relies on its PID to be 1). Containers can only see their own processes, and they can't see (or access in any way, say by sending a signal) processes in other containers.
- IPC namespace, so every container have its own IPC (Inter-Process Communication) shared memory segments, semaphores, and messages.
- Networking namespace, so every container have its own network devices, IP addresses, routing rules, firewall (iptables) rules, network caches and so on.
- /proc and /sys namespaces, for every container to have their own representation of /proc and /sys -- special filesystems used to export some kernel information to applications. In a nutshell, those are subsets of what a host system have.
- FIXME moar moar moar
Note that memory and CPU need not be namespaced. Existing virtual memory and multitask mechanisms are already taking care of it.
Single kernel approach
So, namespaces lets a single kernel run multiple isolated containers. To say it again, all the containers running on a single piece of hardware share one single Linux kernel. Yet again, there is only one single OS kernel running, and on top of that there are multiple isolated instances of user-space programs.
Single kernel approach is much more light-weight than traditional VM-style virtualization. The consequences are:
- Waiving the need to run multiple OS kernels leads to higher density of containers (compared to VMs)
- Software stack that lies in between an application and the hardware is much thinner, this means higher performance of containers (compared to VMs)
Resource management
Due to a single kernel model used, all containers share the same set of resources: CPU, memory, disk and network.
Every container can use all of the available hardware resources if configured so. From the other side, containers should not step on each other's toes, so all the resources are accounted for and controlled by the kernel.
FIXME link to resource management whitepaper goes here
Live migration
Various
Containers overhead
OpenVZ works almost as fast as a usual Linux system. The only overhead is for networking and additional resource management (see below), and in most cases it is negligible.
OpenVZ host system scope
From the host system, all containers processes are visible.
Resource control
Networking (routed/bridged)
Does it differ much from VMs?
Other features
- Live migration
Limitations
From the point of view of a container owner, it looks and feels like a real system. Nevertheless, it is important to understand what are container limitations:
- Container is constrained by limits set by host system administrator. That includes usage of CPU, memory, disk space and bandwidth, network bandwidth etc.t
- Container only runs Linux (Windows or FreeBSD is not an option), although different distributions is not an issue.
- Container can't boot/use its own kernel (it uses host system kernel).
- Container can't load its own kernel modules (it uses host system kernel modules).
- Container can't set system time, unless explicitly configured to do so (say to run
ntpd
in a CT).
- Container does not have direct access to hardware such as hard drive, network card, or a PCI device. Such access can be granted by host system administrator if needed.