Containers density. Why containers are so good at it?
What is density?
Density is a characteristic which tells how many containers (CTs) or Virtual Machines (VMs) virtualization technology can run successfully on a given hardware which very much depends on:
- virtualization technology used (containers or hypervisor);
- features provided by the technology (page sharing, ballooning, memory overcommitment support etc.);
- workload itself. Virtualization is not a miracle and can't handle more high loaded workloads then bare host can. e.g. do not expect any density except for 1VM/CT if you are running Oracle with huge request rates. On the other hand, if you are having multiple small web sites virtualization can easily consolidate them on a single host as long as host is capable to handle IO requests altogether and memory/CPU demands.
Speaking about hypervisors, some of them (like Xen) reserve the whole guest VM memory on its startup and do not allow memory overcommitment. This leads to inability to start more then, say, 15 VMs with 1Gb RAM assigned to each on a 16Gb RAM host. Theoretically this helps to deliver maximum performance as Xen can always use large pages and avoid swapping out or other overheads. However, as other hypervisors (like ESX) demonstrate it's possible to achieve both goals - high performance and density depending on the situation, though it obviously requires more complicated technology which is not yet fully available in any of open-source hypervisors.
Typically users also care for quality of service of their software, i.e. how well their services are working on utilized node and how fast they respond to external requests. The metrics of quality of service are average/min/max response time, 99.9% requests response time and similar. When a hardware node is capable to handle the load, these metrics either do not grow much with bigger number of containers, or grow linearly. When a hardware node becomes over utilized, these metrics typically start to degrade exponentially (due to memory swap out).
Below we summarize differences between containers (OpenVZ, Parallels Containers, Solaris Zones) and hypervisor (ESX, Xen, KVM, HyperV) which affect density and make containers to be more suitable for high density environments then virtual machines.
What makes containers to be perfectly suitable for high density?
- Containers do not reserve memory assigned to them. They exhibit real-time elastic behavior: if a container do not use memory, then it is free for use by other containers. In other words, overcommitment is as natural and easy as on a standalone Linux box, when multiple applications compete for memory resources. As a result, a simple container running ssh, apache, init and cron takes only 10-20Mb of physical RAM. Sure, for efficient caching of apache pages more RAM is needed, but it depends purely on active working set of web site and most frequently accessed files. If site is not accessed frequently kernel may decide to free memory at all (or swap out) and still will be able to get pages back in seconds.
- Containers memory management is system-wide wise. If one container needs more physical RAM (e.g. for apache pages caching) and hardware node have no more memory available, kernel will automatically reclaim last recently used caches of other containers. This process is dynamic by it's nature and do not have any latency like ballooning in case of VMs.
- Global memory management allows one containers to cache more data then RAM assigned to it. In case other containers will need physical memory this cache will be quickly reclaimed automatically. Thus caching can be much more efficient and dynamically adjusts itself to the needs.
- Containers CPU scheduler is interactive system-wide. If some process is woken up by an external event (like networking request) CPU scheduler can quickly preempt current CPU hog task and schedule interactive one. This is not true with VMs where hypervisor doesn't know the nature of tasks inside the VM and whether VM should preempt another VM on even or not. In high-density environments it is particularly important to make sure that interactive tasks get CPU fast enough and provide low latency response to users.
- Parallels Virtuozzo Containers product goes a step further by introducing container templates, used as a basis for all containers, and a special copy-on-write filesystem, which makes sure that original template is kept untouched and container gets its own private copy of template file when it tries to modify it. As a result all common files are shared across the containers and present on disk and in memory caches in a single instance. This saves memory, reduces I/O and makes L2/L3 caches work more effective.
Why Virtual Machines are not that good?
Hypervisors (Xen, ESX, KVM) are not that good on high density scenarios. There are multiple reasons for that:
- (Most) memory is basically reserved on the host on VM start. KVM and XEN reserve the whole memory by default and do not allow memory overcommitment. As a result you can't run more then 15 VMs with 1GB RAM on 16Gb box. ESX, being the most advanced hypervisor, uses page sharing and ballooning technologies to introduce memory overcommitment. However, on practice it allows to get only about 2x times overcommitment on the guests of the same type. Our experiments demonstrate that half of the improvement is due to page sharing, and half is due to ballooning.
- Ballooning helps to reclaim fairly easily part of guest VM memory, however it have some issues. First, it have some noticeable latency since it takes time from deciding to reclaim memory till the moment when guest will do that for hypervisor (and BTW it can simply have problems and do not do that at all). Next, it may lead to swap out inside the guest which negatively affects the whole system I/O performance (IOPS is a scarce resource on practice). And what is worse it may lead to behavior different from standalone machine - e.g. OS may deny memory allocations though guest didn't used assigned resources or trigger Out-Of-Memory (OOM) killer on Linux.
- Typical hypervisor overhead is 50-100Mb for 1Gb 1VCPU guest, i.e. almost 10% of RAM. Detailed overhead table from VMware ESX 4.0 product [p.30 http://www.vmware.com/pdf/vsphere4/r40/vsp_40_resource_mgmt.pdf] also demonstrates how overhead significantly increases with a number of VCPUs assigned to guest.
- VMs are running multiple kernels and their system data structures in memory. RAM pages sharing helps this problem, but on VM startup when memory scanner have not yet merged multiple copies this can be a problem and lead to memory usage bursts.
- Hypervisor CPU scheduler doesn't know details about tasks running inside and their interactivity. See above explanation on containers.
Numbers
XXX