Difference between revisions of "WP/What are containers"
(more on single kernel approach) |
(more stuff, try to remove VM information and really describe what a container is) |
||
Line 1: | Line 1: | ||
= OpenVZ Linux Containers technology whitepaper = | = OpenVZ Linux Containers technology whitepaper = | ||
− | OpenVZ is a virtualization technology for Linux, | + | OpenVZ is a virtualization technology for Linux, which lets one to partition a single physical Linux machine into multiple smaller units called containers. Technically, it consists of three major things: |
− | + | * Namespaces | |
+ | * Resource management | ||
+ | * Checkpointing | ||
− | + | == Namespaces == | |
− | In | + | A namespace is a feature to limit the scope of something. Here, namespaces are used as containers building blocks. A simple care of a namespace is chroot. |
+ | |||
+ | === Chroot === | ||
+ | [[Image:Chroot.png|right|200px]] | ||
+ | Traditional UNIX <code>chroot()</code> system call is used to change the root of the file system of a calling process to a particular directory. That way it limits the scope of file system for the process, so it can only see and access a limited sub tree of files and directories. | ||
+ | |||
+ | Chroot is still used for application isolation. For example, running ftpd in a chroot to avoid a potential security breach. | ||
+ | |||
+ | === New namespaces === | ||
+ | |||
+ | OpenVZ builds on a chroot idea and expands it to everything else that applications have. In other words, every API that kernel provides to applications are "namespaced". Examples include: | ||
+ | |||
+ | * File system namespace -- this one is chroot() itself. | ||
+ | |||
+ | * PID namespace, so in every container processes have its own unique process IDs, and the first process inside a container have a PID of 1 (it is usually /sbin/init process which actually relies on its PID to be 1). Containers can only see their own processes, and they can't see (or access in any way, say by sending a signal) processes in other containers. | ||
+ | |||
+ | * IPC namespace, so every container have its own IPC (Inter-Process Communication) shared memory segments, semaphores, and messages. | ||
+ | |||
+ | * Networking namespace, so every container have its own network devices, IP addresses, routing rules, firewall (iptables) rules, network caches and so on. | ||
+ | |||
+ | * /proc and /sys namespaces, for every container to have their own representation of /proc and /sys -- special filesystems used to export some kernel information to applications. In a nutshell, those are subsets of what a host system have. | ||
+ | |||
+ | * FIXME moar moar moart | ||
+ | |||
+ | == Single kernel approach == | ||
+ | |||
+ | Multiple isolated containers are running on top of one single kernel. This is pretty much the same Linux kernel, just with added notion of containers. Basically, | ||
+ | |||
+ | All the containers running on a single piece of hardware share one single Linux kernel. There is only one single OS kernel running, and on top of that there are multiple isolated instances of user-space programs. | ||
+ | |||
+ | Single kernel approach is much more light-weight than traditional VM-style virtualization. The consequences are: | ||
# Waiving the need to run multiple OS kernels leads to '''higher density''' of containers (compared to VMs) | # Waiving the need to run multiple OS kernels leads to '''higher density''' of containers (compared to VMs) | ||
Line 45: | Line 77: | ||
* Live migration | * Live migration | ||
+ | |||
+ | == Limitations == | ||
+ | |||
+ | From the point of view of a container owner, it looks and feels like a real system. Nevertheless, it is important to understand what are container limitations: | ||
+ | |||
+ | * Container is constrained by limits set by host system administrator. That includes usage of CPU, memory, disk space and bandwidth, network bandwidth etc.t | ||
+ | |||
+ | * Container only runs Linux (Windows or FreeBSD is not an option), although different distributions is not an issue. | ||
+ | |||
+ | * Container can't boot/use its own kernel (it uses host system kernel). | ||
+ | |||
+ | * Container can't load its own kernel modules (it uses host system kernel modules). | ||
+ | |||
+ | * Container can't set system time, unless explicitly configured to do so (say to run <code>ntpd</code> in a CT). | ||
+ | |||
+ | * Container does not have direct access to hardware such as hard drive, network card, or a PCI device. Such access can be granted by host system administrator if needed. |
Revision as of 17:29, 15 March 2011
Contents
OpenVZ Linux Containers technology whitepaper
OpenVZ is a virtualization technology for Linux, which lets one to partition a single physical Linux machine into multiple smaller units called containers. Technically, it consists of three major things:
- Namespaces
- Resource management
- Checkpointing
Namespaces
A namespace is a feature to limit the scope of something. Here, namespaces are used as containers building blocks. A simple care of a namespace is chroot.
Chroot
Traditional UNIX chroot()
system call is used to change the root of the file system of a calling process to a particular directory. That way it limits the scope of file system for the process, so it can only see and access a limited sub tree of files and directories.
Chroot is still used for application isolation. For example, running ftpd in a chroot to avoid a potential security breach.
New namespaces
OpenVZ builds on a chroot idea and expands it to everything else that applications have. In other words, every API that kernel provides to applications are "namespaced". Examples include:
- File system namespace -- this one is chroot() itself.
- PID namespace, so in every container processes have its own unique process IDs, and the first process inside a container have a PID of 1 (it is usually /sbin/init process which actually relies on its PID to be 1). Containers can only see their own processes, and they can't see (or access in any way, say by sending a signal) processes in other containers.
- IPC namespace, so every container have its own IPC (Inter-Process Communication) shared memory segments, semaphores, and messages.
- Networking namespace, so every container have its own network devices, IP addresses, routing rules, firewall (iptables) rules, network caches and so on.
- /proc and /sys namespaces, for every container to have their own representation of /proc and /sys -- special filesystems used to export some kernel information to applications. In a nutshell, those are subsets of what a host system have.
- FIXME moar moar moart
Single kernel approach
Multiple isolated containers are running on top of one single kernel. This is pretty much the same Linux kernel, just with added notion of containers. Basically,
All the containers running on a single piece of hardware share one single Linux kernel. There is only one single OS kernel running, and on top of that there are multiple isolated instances of user-space programs.
Single kernel approach is much more light-weight than traditional VM-style virtualization. The consequences are:
- Waiving the need to run multiple OS kernels leads to higher density of containers (compared to VMs)
- Software stack that lies in between an application and the hardware is much thinner, this means higher performance of containers (compared to VMs)
Containers overhead
OpenVZ works almost as fast as a usual Linux system. The only overhead is for networking and additional resource management (see below), and in most cases it is negligible.
File system
From file system point of view, a container is just a chroot()
environment. In other words, a container file system root is merely a directory on the host system (usually /vz/root/$CTID/, under which one can find usual directories like /etc
, /lib
, /bin
etc.). The consequences are:
- there is no need for a separate block device, hard drive partition or filesystem-in-a-file setup
- host system administrator can see all the containers' files
- containers backup/restore is trivial
- mass deployment is easy
OpenVZ host system scope
From the host system, all containers processes are visible.
Resource control
Due to a single kernel model used, all containers share the same set of resources: CPU, memory, disk and network.
Every container can use all of the available hardware resources if configured so. From the other side, containers should not step on each other's toes, so all the resources are accounted for and controlled by the kernel.
FIXME link to resource management whitepaper goes here
Networking (routed/bridged)
Does it differ much from VMs?
Other features
- Live migration
Limitations
From the point of view of a container owner, it looks and feels like a real system. Nevertheless, it is important to understand what are container limitations:
- Container is constrained by limits set by host system administrator. That includes usage of CPU, memory, disk space and bandwidth, network bandwidth etc.t
- Container only runs Linux (Windows or FreeBSD is not an option), although different distributions is not an issue.
- Container can't boot/use its own kernel (it uses host system kernel).
- Container can't load its own kernel modules (it uses host system kernel modules).
- Container can't set system time, unless explicitly configured to do so (say to run
ntpd
in a CT).
- Container does not have direct access to hardware such as hard drive, network card, or a PCI device. Such access can be granted by host system administrator if needed.