Editing Containers/Mini-summit 2008 notes

[[Category: Containers]]

Intros (8:36am)

        Dave Hansen
        Eric Biederman
        Jason Byron, Red Hat
        Joe Rusio, Evergreen
        Joe McDonald
        HP China
        Sonny Rao
        HP
        HP
        Matine Silberman HP
        Sandy Harris
        NEC Japan
        John Schultz, AOL
        Pavel Emelyanov, Parallels/OpenVZ
        Denis Lunev, Parallels/OpenVZ
        Constant Chan
        Benjamin Thery, Bull
        Daniel Lezcano, IBM
        Serge Hallyn, IBM

On Phone:
        Amy Griffith HP
        Dhaval Giani, IBM

(Later walk-ins)

Topics:

Why do various companies want containers?
        ibm: workload management
        EB: using containers as improved chroot
        HP: wants similar to ibm, plus security
        parallels: hosted providers

sysfs issues
        EB gives status: should go into next merge window

mini-namespaces
        NFS
                clients should behave differently on diff. containers
                currently uses single sunrpc transport for all containers
        Dave: is there a list of all openvz mini-ns?
        EB:
                proposal:
                        create little filesystems
                        still store everything in nsproxy
                currently:
                        some people want same process in different netns's
                        almost possible now, but can't open new sockets
                namespace enter:
                        3 purposes
                                login
                                monitoring
                                configuring
                may be worth prototyping the proposal
                        address mqns, or sunrpc, or fuse?
        DH:
                openvz addresses this using one big clone(), right?
                (yes)

userid namespaces
        EB summarizes his proposal
                userid ns is unsharable without privilege
                userids, capabilities, security labels become ns-local
                hierarchical like pidns
        openvz: just does chroot
        DH:
                observers that system vs. app containers have different requirements
        EB:
                so with userid namespaces, user has god-like powers over created namespaces
        EB+SH will talk about hacking something this week during ols
        Uses:
                user unttrusted mounts
                build systems

device namespaces
        tty namespaces rejected
        should be solved with generic device namespaces
                virtualize the major:minor->device mapping
        reserved device numbers (unnamed)
                created with /proc?
                get_unnamed_device()
        tty ideas:
                use selinux ptys
                use user namespaces
                use legacy ptys
                leverage ptyfs
        Suka is not on, so he gets volunteered to do pure /dev/pts fs approach

per-container LSMs:
        SH: thinks LSMs should handle it
        EB:
                original purpose of chroot
                set up policies from inside container
                creating smack container inside selinux would be ideal

entering a  container
        netns: identified using pid of a ns
        sh: can we solve this using EB's namespace filesystems proposal?
        (EB goes to the board to demonstrate his proposal)
        PM: Can we use control groups?
        PE: Can we re-use /proc/pid/ ?
        EB: could have a ns with no processes in it
        Example of command using this:
                ip set eth0 netns <pid>
                becomes
                ip set eth0 netns /proc/<pid>/
        DL:
                a real netns problem is knowing when a childns has died
                the netnsfs mount could solve that
        PE: EB, can you send POC patches for the namespace?
                EB and EM will both send their own POC.

DL: people have complained about needing CAP_SYS_ADMIN to unshare ns
        EB: example, setuid root sysvipc-using program could be fooled

PE: Entering a container:
        reasons:
                monitoring
                enter an administrative command
        DH: how do you do it now?
        PE: numerical ID for each VE, use it to enter
        EB:
                one need for entering: /sbin/hotplug
        (someone): does hijack suffice?
        EB: two cases:
                partial entering
                full entering
                sys_hijack does not address partial entering
        DH:
                why need partial entering?
                fs stuff can be done without entering
        PM: privileged process
        PE:
                will look at hijack patches
                someone will re-send hijack to containers@
                EB:
                        if we can do sys_hijack cleanly,
                        we can use it to solve kthread problem