Difference between revisions of "Containers/Mini-summit 2008 notes"
(fixed Pavel/Den names) |
|||
| Line 14: | Line 14: | ||
NEC Japan | NEC Japan | ||
John Schultz aol | John Schultz aol | ||
| − | Pavel | + | Pavel Emelyanov, Parallels/OpenVZ |
| − | Denis | + | Denis Lunev, Parallels/OpenVZ |
(?) | (?) | ||
Benjamin | Benjamin | ||
Revision as of 14:57, 22 July 2008
Intros (8:36am)
Dave Hansen
Eric Biederman
Jason Byron redhat
Joe Rusio - evergreen
Joe McDonald
HP China
Sonny Rao
HP
HP
Matine Silberman HP
Sandy Harris
NEC Japan
John Schultz aol
Pavel Emelyanov, Parallels/OpenVZ
Denis Lunev, Parallels/OpenVZ
(?)
Benjamin
Daniel
Serge
On Phone:
Amy Griffith HP
(Later walk-ins)
Topics:
Why do various companies want containers?
ibm: workload management
EB: using containers as improved chroot
HP: wants similar to ibm, plus security
parallels: hosted providers
sysfs issues
EB gives status: should go into next merge window
mini-namespaces
NFS
clients should behave differently on diff. containers
currently uses single sunrpc transport for all containers
Dave: is there a list of all openvz mini-ns?
EB:
proposal:
create little filesystems
still store everything in nsproxy
currently:
some people want same process in different netns's
almost possible now, but can't open new sockets
namespace enter:
3 purposes
login
monitoring
configuring
may be worth prototyping the proposal
address mqns, or sunrpc, or fuse?
DH:
openvz addresses this using one big clone(), right?
(yes)
userid namespaces
EB summarizes his proposal
userid ns is unsharable without privilege
userids, capabilities, security labels become ns-local
hierarchical like pidns
openvz: just does chroot
DH:
observers that system vs. app containers have different requirements
EB:
so with userid namespaces, user has god-like powers over created namespaces
EB+SH will talk about hacking something this week during ols
Uses:
user unttrusted mounts
build systems
device namespaces
tty namespaces rejected
should be solved with generic device namespaces
virtualize the major:minor->device mapping
reserved device numbers (unnamed)
created with /proc?
get_unnamed_device()
tty ideas:
use selinux ptys
use user namespaces
use legacy ptys
leverage ptyfs
Suka is not on, so he gets volunteered to do pure /dev/pts fs approach
per-container LSMs:
SH: thinks LSMs should handle it
EB:
original purpose of chroot
set up policies from inside container
creating smack container inside selinux would be ideal
entering a container
netns: identified using pid of a ns
sh: can we solve this using EB's namespace filesystems proposal?
(EB goes to the board to demonstrate his proposal)
PM: Can we use control groups?
PE: Can we re-use /proc/pid/ ?
EB: could have a ns with no processes in it
Example of command using this:
ip set eth0 netns <pid>
becomes
ip set eth0 netns /proc/<pid>/
DL:
a real netns problem is knowing when a childns has died
the netnsfs mount could solve that
PE: EB, can you send POC patches for the namespace?
EB and EM will both send their own POC.
DL: people have complained about needing CAP_SYS_ADMIN to unshare ns
EB: example, setuid root sysvipc-using program could be fooled
PE: Entering a container:
reasons:
monitoring
enter an administrative command
DH: how do you do it now?
PE: numerical ID for each VE, use it to enter
EB:
one need for entering: /sbin/hotplug
(someone): does hijack suffice?
EB: two cases:
partial entering
full entering
sys_hijack does not address partial entering
DH:
why need partial entering?
fs stuff can be done without entering
PM: privileged process
PE:
will look at hijack patches
someone will re-send hijack to containers@
EB:
if we can do sys_hijack cleanly,
we can use it to solve kthread problem