Editing Containers/Mini-summit 2008 notes

Intros (8:36am)

        Dave Hansen
        Eric Biederman
        Jason Byron redhat
        Joe Rusio - evergreen
        Joe McDonald
        HP China
        Sonny Rao
        HP
        HP
        Matine Silberman HP
        Sandy Harris
        NEC Japan
        John Schultz aol
        Pavel Emelyanov, Parallels/OpenVZ
        Denis Lunev, Parallels/OpenVZ
        Constant Chan
        Benjamin
        Daniel
        Serge

On Phone:
        Amy Griffith HP

(Later walk-ins)

Topics:

Why do various companies want containers?
        ibm: workload management
        EB: using containers as improved chroot
        HP: wants similar to ibm, plus security
        parallels: hosted providers

sysfs issues
        EB gives status: should go into next merge window

mini-namespaces
        NFS
                clients should behave differently on diff. containers
                currently uses single sunrpc transport for all containers
        Dave: is there a list of all openvz mini-ns?
        EB:
                proposal:
                        create little filesystems
                        still store everything in nsproxy
                currently:
                        some people want same process in different netns's
                        almost possible now, but can't open new sockets
                namespace enter:
                        3 purposes
                                login
                                monitoring
                                configuring
                may be worth prototyping the proposal
                        address mqns, or sunrpc, or fuse?
        DH:
                openvz addresses this using one big clone(), right?
                (yes)

userid namespaces
        EB summarizes his proposal
                userid ns is unsharable without privilege
                userids, capabilities, security labels become ns-local
                hierarchical like pidns
        openvz: just does chroot
        DH:
                observers that system vs. app containers have different requirements
        EB:
                so with userid namespaces, user has god-like powers over created namespaces
        EB+SH will talk about hacking something this week during ols
        Uses:
                user unttrusted mounts
                build systems

device namespaces
        tty namespaces rejected
        should be solved with generic device namespaces
                virtualize the major:minor->device mapping
        reserved device numbers (unnamed)
                created with /proc?
                get_unnamed_device()
        tty ideas:
                use selinux ptys
                use user namespaces
                use legacy ptys
                leverage ptyfs
        Suka is not on, so he gets volunteered to do pure /dev/pts fs approach

per-container LSMs:
        SH: thinks LSMs should handle it
        EB:
                original purpose of chroot
                set up policies from inside container
                creating smack container inside selinux would be ideal

entering a  container
        netns: identified using pid of a ns
        sh: can we solve this using EB's namespace filesystems proposal?
        (EB goes to the board to demonstrate his proposal)
        PM: Can we use control groups?
        PE: Can we re-use /proc/pid/ ?
        EB: could have a ns with no processes in it
        Example of command using this:
                ip set eth0 netns <pid>
                becomes
                ip set eth0 netns /proc/<pid>/
        DL:
                a real netns problem is knowing when a childns has died
                the netnsfs mount could solve that
        PE: EB, can you send POC patches for the namespace?
                EB and EM will both send their own POC.

DL: people have complained about needing CAP_SYS_ADMIN to unshare ns
        EB: example, setuid root sysvipc-using program could be fooled

PE: Entering a container:
        reasons:
                monitoring
                enter an administrative command
        DH: how do you do it now?
        PE: numerical ID for each VE, use it to enter
        EB:
                one need for entering: /sbin/hotplug
        (someone): does hijack suffice?
        EB: two cases:
                partial entering
                full entering
                sys_hijack does not address partial entering
        DH:
                why need partial entering?
                fs stuff can be done without entering
        PM: privileged process
        PE:
                will look at hijack patches
                someone will re-send hijack to containers@
                EB:
                        if we can do sys_hijack cleanly,
                        we can use it to solve kthread problem