Contents
Changes
Since 028stab057.2:
- Rebased on 2.6.18-92.1.13 RHEL5 update:
- CVE-2008-1294
- CVE-2008-2136
- CVE-2008-2812
- CVE-2008-0598
- CVE-2008-2358
- CVE-2008-2729
- CVE-2007-6417
- CVE-2007-6716
- CVE-2008-2931
- CVE-2008-3272
- CVE-2008-3275
- Other bugfixes and improvements
 
- Compilation fixes with some .config variants
- Race in checkpointing wrt half-open connection restore
- OpenVZ-in-Xen compilation fixed
- Long standing deadlock in dropping caches fixed
- Updated GFSv1 up to 0.1.23-5.el5_2.2 (not working, see #Compatibility below)
- Updated DRBD up to 8.2.6
- IPIP and SIT devices support
- MIB stats migration
- Drop potential VE immune to OOM killer
- Finally properly ban simultaneous vzdq and nfs work
- Ban NFS submounts (not to oops)
- Other bugfixes in UBC, CPT
Compatibility
- gfs.ko module can not be loaded. If you depend on this, please use previous stable kernel.
Configs
Same as in 028stab057.2.
Patches
diff-cpt-synwait-restore-lock-20080821
Patch from Pavel Emelianov <xemul@openvz.org>
cpt: lock sock before restoring its synwait queue
This new socket already has all the necessary TCP timers armed, so tcp_keepalive_timer can fire during the rst_restore_synwait_queue and (for the latter being lockless) can spoil the queue.
Locking in the restoring procedure is requires.
Bug #118912.
diff-fs-quotcompat-xencomp-fix-20080806
Patch from Marat Stanichenko <mstanichenko@openvz.org>
quota: Compilation fix for XEN kernels
CONFIG_QUOTA_COMPAT is not enabled in Xen config so we bump into the problem of undefined structures.
Bug #118177.
diff-ms-add-limits_h-to-sumversions_c-20080820
Patch from Pavel Emelianov <xemul@openvz.org>
Fix sumversion.c compilation with some modern compilers
diff-ms-cifs-lanman-off-compilation-20080820
Patch from Pavel Emelianov <xemul@openvz.org>
cifs: fix compilation for no-lanman case
backported mainstream commit 516897a208bc1423d561ce2ccce0624c3b652275
diff-ms-fix-xfrm-compilation-20080818
Patch from Vitaliy Gusev <vgusev@openvz.org>
[PATCH] Fix compilation error when CONFIG_XFRM is not set.
diff-rh-export-flush_tbl_page-for-xen-20080806
Patch from Marat Stanichenko <mstanichenko@openvz.org>
xen: Fix build for x86_64 arch
we have to export flush_tlb_page() symbol in Xen-x86_64 kernel because cpt modules uses this symbol.
Bugs #118432, #118177.
diff-rh-xen-include-cacheflush-20080806
Patch from Marat Stanichenko <mstanichenko@openvz.org>
xen: Bad «#include» directives position causes Xen-i386 compilation error.
 arch/i386/mm/ioremap-xen.c
  |#include <asm/cacheflush.h> (#define _I386_CACHEFLUSH_H)
    |#include <linux/mm.h>
      |#include <linux/pagemap.h>
        |#include <linux/highmem.h>
           |#include <asm/cacheflush.h>
(but nothing actually includes right now(see #define _I386_CACHEFLUSH_H)).
cacheflush.h uses the functions from <asm/cacheflush.h> so the definition of the functions occurs after somebody uses them.
Bug #118177.
diff-rh-xfrm-more-macros-compilation-20080820
Patch from Pavel Emelianov <xemul@openvz.org>
xfrm: more compilation fixes for wierd openvz users config
diff-ubc-subbcino-gen-fix-20080820
Patch from Pavel Emelianov <xemul@openvz.org>
bc: fix subbeancounter inode number calculations in /proc/bc
0 and 0.0 still have the same number…
Bug #116868.
diff-ve-nf-ct-checksum-ro-inve-20080722
Patch from Vasily Averin <vvs@openvz.org>
netfilter: Fix broken isolation for ip_conntrack_checksum sysctl
net.ipv4.netfilter.ip_conntrack_checksum should be read-only inside VE.
Bug #117138.
diff-vfs-lock-inversion-in-drop_pagecache_sb-20080820
Patch from Dmitry Monakhov <dmonakhov@openvz.org>
vfs: fix lock inversion in drop_pagecache_sb()
backport mainstream commit: eccb95cee4f0d56faa46ef22fb94dd4a3578d3eb
Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock before calling __invalidate_mapping_pages(). We just have to make sure inode won't go away from under us by keeping reference to it and putting the reference only after we have safely resumed the scan of the inode list. A bit tricky but not too bad… Signed-off-by: Jan Kara <jack@suse.cz> Cc: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: David Chinner <dgc@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug #116673.
diff-cciss-reformat-error-handling, 
diff-cciss-add-sg-io-ioctl, 
diff-cciss-printk-creq-flags, 
diff-scsi-add-modalias-mainstream
Patches from Marat Stanichenko <mstanichenko@parallels.com>
Various kludges to make cciss work properly and make udev receive scsi uevents.
Bugs #114972, #114130.
diff-cpt-add-snmp-stats-20080930
Patch from Pavel Emelianov <xemul@openvz.org>
cpt: dump and restore global snmp statistics
Per device exists for ipv6 only and is probably not used now, but anyway — I'll do it later.
This patch adds new section CPT_SECT_SNMP_STATS that is populated with CPT_OBJ_BITS set of objects — one for each type of statistics. Objects have variable length. Stats are stored as a plain array of __u32 numbers and thus the order in which stats types are stored is implicitly hard-coded.
In case we do not have an IPV6 turned on all ipv6 stats are dumped as CPT_OBJ_BITS/CPT_CONTENT_VOID and are skipped on restore.
When we restore from an image with more stats in any type, the not supported ones are dropped with a warning.
Stats add 28K to image file.
Bug #113930.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
diff-cpt-fix-cpt_family-restore-20080915
Patch from Vitaliy Gusev <vgusev@openvz.org>
[PATCH] rst: Fix memory corruption if cpt_family is wrong.
During restore, if parent socket is AF_INET but cpt_family is wrong (non initialized, see bug ##95113), then consider request as related to AF_INET6 is not right and leads to memory corruption.
As there are a lot of buggy images, so we can't check only on values AF_INET and AF_INET6.
Decision: - Check request on AF_INET6 first, and consider request as AF_INET by default. - Additionally checkup for AF_INET6 request (protect from random value cpt_family == AF_INET6)
Bug #118912.
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org> Acked-by: Denis V. Lunev <den@openvz.org>
diff-cpt-ipip-20080923
Patch from Pavel Emelianov <xemul@openvz.org>
cpt: add support for ipip tunnel
Actually, sit also uses the ip_tunnel structure I'm saving and restoring in the image, but this only adds support for ipip device (sit will be checked later).
I add new object type and store most of the ip_tunnel_parm contents. Restoration is a little bit more tricky, as the fb device is created on container start.
Bug #115412.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
diff-cpt-open-init-stds-early-20080923
Patch from Pavel Emelianov <xemul@openvz.org>
cpt: fix restoring of /dev/null opened early by init
The problem is the following:
- init from fc9 starts and opens /dev/null for its stdin, stdout and stderr
- udev starts and overmounts /dev with tmpfs
After this cpt cannot dump this ve, since one process holds a file, that is inaccessible from ve root.
The proposed solution is the following:
- allow for /dev/null to be over-mounted
- restore init's file in two stages:
- stage1: *before* we restored mounts restore init's 0, 1 and 2 file descriptors, since most likely (in fc9 case — definitely) init opened them before any other manipulations with fs;
- stage2: restore the rest files later, at usual time to make sore that e.g. sockets etc are restored properly.
Comment from Alexey:
ACK. Though this is really ugly, it really produces 100% correct result for this particular situation.
Bug #116261.
diff-cpt-sit-20080930
Patch from Pavel Emelianov <xemul@openvz.org>
cpt: add sit devices migration
The code mostly re-uses the ipip migration one, by adding the CPT_DEV_SIT flag to the image, thus making the name CPT_OBJ_NET_IPIP_TUNNEL a bit confusing :(
Bug #115412.
diff-ms-copyfiles-20080910
Patch from Denis Lunev <den@openvz.org> ms: properly assign value to the tsk->files
The race is the following:
 slm_task_inst_usage
    task_lock(t);
    files = t->files;
                             flush_old_exec
 			      unshare_files
 			        copy_files
 			      put_files_struct
    files->fdt
 
So, we are definitely accessing already freed memory for the case. The only correct fix for the case is to bound the assignment with task_lock.
Bug #120812.
diff-ms-no-hotplug-compilation
Patch from Pavel Emelianov <xemul@openvz.org>
Fix kobjects compilation for hotplug-less config
diff-ms-procinodegen-20080919
Patch from Denis Lunev <den@openvz.org>
proc: generate inode number for proc pid inodes correctly
with max_pid=200000 inode numbers are generated wrong due and can duplicate
Bug #121659.
Signed-off-by: Denis V. Lunev <den@openvz.org>
diff-nfs-fake-lookuproot-ops-20080910
Patch from Denis Lunev <den@openvz.org>
nfs: prohibit lookup on mountpoint inode (nfs submount) for VEFS
This is a very rough kludge to prevent an OOPS which could happen if nfs server has submounts and does not hide them.
Signed-off-by: Denis V. Lunev <den@parallels.com>
Bug #119698.
diff-ub-sndbuf-synack-leak-20080910
Patch from Denis Lunev <den@openvz.org>
ub: incorrect skb is charged in tcp_send_synack
New one should be charged rather than old.
diff-ve-binfmt_misc_ve_stop_oops_fix-20080930
Patch from Konstantin Ozerkov <kozerkov@openvz.org>
Fix OOPS while stopping VE after binfmt_misc.ko loaded
ve_binfmt_fini() should check if current VE have registered binfmt_misc fs. (Properly handling situation while stopping VE which started before binfmt_misc.ko loaded)
xemul: this doesn't affect rhel5 kernel, since this one is 'y' at our config, but I have patches to add its migration, which requires it to be a module. Thus this patch might become required.
diff-ve-drop-oom-immunity-at-enter-20080901
Patch from Konstantin Khlebnikov <khlebnikov@openvz.org>
drop OOM protection at entering to CT
At CT enter switch to default OOM adjustment level if task is OOM-immune.
This is a very bad idea to have OOM-unkillable tasks inside container, because all forked tasks inherit this setting.
Proc interface for changing OOM adjustment (/proc/<pid>/oom_adj) already restricted in CT by diff-ve-oom-adjust-20070604.
On some systems sshd got OOM protection at start and not drop it after fork.
(example: ssh root@HN -> vzctl enter -> restart apache — apache now OOM immune)
(example from xemul@: ssh root@HN vzctl start — VE is now OOM immune)
diff-ve-fix-idle-time-accout-20080815
Patch from Konstantin Khlebnikov <khlebnikov@openvz.org>
Fix idle time account in case of iowait tasks presence
one uninterruptible task block idle time counter on all idle vcpus in ve.
originaly at diff-ve-fairsched-statiow-20050823 idle time after strt_idle_time accounted in idle or iowait depends on total count of uninterruptible tasks, but after diff-ve-sched-stat-iowait-20060417 and diff-ve-iowait-20060525 iowait branch triggered by nonzero vcpu_rq(vcpu)->nr_iowait.
this patch do the same for idle branch.
split interface into two functions:
ve_sched_get_idle_time(cpu) — cpu idle time in current ve.
ve_sched_get_idle_time_total(ve) — ve total idle time.
v2 changes:
change __ve_sched_get_idle_time second argument from vsched to vcpu and make it optional — without vcpu time after strt_idle_time not accounted as idle.
remove vsched lookup code in case if ve init task not in ve vsched (init was dead and VE in the middle of shutdown process), in this case no reason to care about idle-time accounting accuracy.
Bug #114633.
diff-ve-net-ipip-20080912
Patch from Pavel Emelianov <xemul@openvz.org>
ipip: add ipip tunnel support in VEs
This is the same patch I did for mainstream, but for 2.6.18 kernel and thus resembles the sit virtualization patch.
Some functions are exported for the patch #2 — checkpointing support (yes, I still remember the bug #101061 ;) )
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
diff-ve-nf-cthelpers-fix-leaks-20080929
Patch from Vitaliy Gusev <vgusev@openvz.org>
conntrack: Fixed leak used counter of ip_conntrack_ftp module
struct ip_conntrack_helper has a name as a pointer to nul-terminated string, but a list_named_find() must be used for the structures with inlined string. Thus list_named_find() always returns NULL in virt_ip_conntrack_helper_unregister().
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
diff-vzdq-with-nfs-disable-20080930
Patch from Denis Lunev <den@openvz.org>
nfs: NFSD/vzquota mutual exclution
NFSD and vzquota can't run simultaneously on the same filesystem as NFSD works with not attached dentries while vzquota requires this in order to function properly.
This patch prohibits to vzquota on over remotely mounted filesystem and to mount a filesystem with vzquota on.
Bug #115332.
Signed-off-by: Denis V. Lunev <den@parallels.com>
diff-ve-net-bridge-via-phys-dev2-20070514
Patch from Dmitry Mishin <dim@openvz.org>
[BRIDGE] bridge deliver to original eth0 device
- now packets are input to the local system as they are coming from phys device only;
- fixed bunch of bugs with VE <-> HN communications.
diff-ve-net-sit-virtualize-20080627
Patch from Pavel Emelianov <xemul@openvz.org>
Virtualize sit device.
This mostly looks as sit netnsization patches I did for mainstream, but have some pecularities:
- sit is builtin in ipv6 module in this kernel
- VE_FEATURE_SIT controlls the sit availability in VE
Bug #115411.