LPC2015 Containers

From OpenVZ Virtuozzo Containers Wiki
Jump to: navigation, search

Containers Microconference Notes

(source)

Welcome to Linux Plumbers Conference 2015

The structure will be short introductions to an issue or topic followed by a discussion with the audience. A limit of 3 slides per presentation is enforced to ensure focus and allocate enough time for discussions.

Please use this etherpad to take notes. Microconf leaders will be giving a TWO MINUTE summary of their microconference during the Friday afternoon closing session.

Please remember there is no video this year, so your notes are the only record of your microconference.

Miniconf leaders: Please remember to take note of the approximate number of attendees in your session(s).

  • Attendees: ~120
  • Active attendees: ~20

SCHEDULE

State of LXC - Stéphane Graber

LXC 1.1 was released in early 2015 with:

  • First container manager to ship with checkpoint/restore support through CRIU.
  • Support for running systemd inside LXC containers.
  • Introduction of lxcfs (meminfo, cpuinfo, stat, uptime, diskstats, fake cgroupfs).
  • Support for booting from qcow2 VM images.
  • liblxc API is still compatible with LXC's 1.0

Introduction to LXD - Stéphane Graber

  • Networked daemon running on top of liblxc
  • Exports a simple REST API to manage containers, snapshots and images.
  • Secure by default, defaults to using usernamespaces, uses apparmor, seccomp, capabilities and cgroups too
  • Command line tool can drive multiple LXD daemons
    • Start container from remote images (cached)
    • Copy or move containers between hosts
    • Copy or move container images between hosts
  • Transparent use of checkpoint/restore

Current LXD git tree (github.com/lxc/lxd) does appear to contain a couple of bugs which obviously decided to show themselves during the demo part of this.

CRIU status update - Andrey Vagin

Andrey went through a short history of checkpoint/restore and what happened recently in CRIU upstream as well as current focus. A separate mini-summit specifically for CRIU is scheduled at this year's Plumbers.

Containers in the Upstream Kernel (vs OpenVZ kernel) - Sergey Bronnikov / Kir Kolyshkin

Graph with amount of patches for upstream kernel http://openvz.org/File:Kernel_patches_stats.png The size of the OpenVZ patch has seen a significant decrease over the past few RHEL releases, now using upstream kernel features. Remaining delta is mostly around storage (ploop, ext4), memory management and accounting (idle memory tracking, network buffers), OOM killer virtualization, enhanced /sys and /proc virtualization, /dev/console virtualization, layer3 networking (venet), printk virtualization and time namespace (for monotonic timers wrt live migration), a few minor controllers (numproc, numfile, numiptables entries)

Changes to ext4 are related to resource limitation (ENOSPC handling) and defragmentation.

minimal embedded containers system: https://github.com/mhiramat/mincs

Slides - http://www.slideshare.net/openvz/whats-missing-from-upstream-kernel-containers-kir-kolyshkin

Open Container Specifications by Brandon Philips

Specifications at: http://github.com/opencontainers/specs https://github.com/opencontainers/specs/blob/master/config.md#platform-specific-configuration implies the container manager has to be written in Go. Slides: https://speakerdeck.com/philips/linux-plumbers-conf-open-container-initiative-and-container-network-interface

Designing Plugin Systems for Container Runtimes by Brandon Philips

Slides: https://speakerdeck.com/philips/linux-plumbers-conf-open-container-initiative-and-container-network-interface?slide=12

  • Multiple ways for container to get "real" IP and also be part of multiple network domains.
  • Container Network Interface provides a network abstraction layer for containers, so apps in container are not aware of veth, macvlan, ipvlan or any other underlying L2/L3 tech.
  • The network namespace setup is typically mounted from a globally visible filenamespace (because net namespaces were classically set up with `ip netns`, incrementally with shell commands, and network namespaces are still useful when there's no processes in it (neutron's virtual routers use this))

Container-aware Tracing Infrastructure by Aravinda Prasad

  • The kernel doesn't have the concept of a container
  • Defining what a container is is diffcult
  • Prototype assumes has its own pid namespace
  • Discussion from audience on why the PID namespace is insufficient for containers that need access to the host PID namespace (NFV, etc)
  • The Kernel doesn't have a generic object labelling and it is unlikely it will (for performance reasons)

Next steps?

  • Talk to Eric Biederman
  • Suggestion to do it from userspace instead
  • Who is the customer?
    • More succinctly: what is the use case
    • For system containers, PID isn't a bad assumption and is sort of a requirement to emulate a full OS
    • For application containers, it is more likely that the user will have debugging tools on the host to dig into the container

Likely the trace cgroup is the best you can do for this

Running Docker inside VZ containers by Maxim Patlasov

Context is running docker engine inside of a VZ system container Have to virtualize proc and sys and cgroupfs to make this to work Docker graph driver creates and manages filesystems for containers with snapshot semantics

  • Decided to use devicemapper graph driver in this system
  • Requires two block devices: metadata and data to make the graph driver to work

Architecture has a userspace proxy to set up the devicemapper device w/o host permissions

  • something like this is proposed upstream (unmerged) to have a extensible graph drivers https://github.com/docker/docker/pull/13777
  • there is some point of disagreement on this solution but I didn't follow -- HELP WANTED :)

Questions:

Q: Is there a generic solution to mapping devices into containers from this specific example? A: We need to enumerate the full use cases first


The future of cgroups and containers by Serge Hallyn

  • Unified heirarchy is coming
  • It works (except in the demo) :-P


cgroups kernel memory controller by Pavel Emelyanov

  • most details on the slide. Overall recommendation is to set kmemlimit greater than memlimit