Download/kernel/rhel5/028stab060.2/changes

From OpenVZ Virtuozzo Containers Wiki
< Download‎ | kernel‎ | rhel5‎ | 028stab060.2
Revision as of 18:32, 22 October 2009 by Kir (talk | contribs) (Protected "Download/kernel/rhel5/028stab060.2/changes": Robot: Protecting a list of files. [edit=autoconfirmed:move=autoconfirmed])
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

Changes

Since 028stab059.6:

  • Rebased on 2.6.18-92.1.18 RHEL5 update (RHSA-2008-0957)
  • Backported some patches from RHEL5 update 92.1.22 (RHSA-2008-1017)
  • Fixed utimensat system call (OpenVZ Bug #970)
  • Fixed CAP_AUDIT capability in CT (for dbus)
  • Added UB_SWAPINFO resource (for Oracle in CTs, needs vzctl >= 3.0.24)
  • NFS deadlocks fixed
  • Many small fixes in CPT code

Configs

Same as in 028stab059.6, plus:

  • +CONFIG_FB_EFI=y

Compatibility

No new issues.


Patches

Ported from RHEL5 2.6.18-92.1.22.el5 kernel

  • linux-2.6-nfs-v4-credential-ref-leak-in-nfs4_get_state_owner.patch
  • linux-2.6-net-ipv4-fix-byte-value-boundary-check.patch
  • linux-2.6-fs-don-t-allow-splice-to-files-opened-with-o_append.patch
  • linux-2.6-drm-i915-driver-arbitrary-ioremap.patch

diff-cpt-conntracks-fix-used-count-20081001

Patch from Vitaliy Gusev <vgusev@openvz.org>
[PATCH] CPT: Fix ip_conntrack_ftp usage counter leak

Function ip_conntrack_helper_find_get() gets module counter. So put a conntrack after putting in the hash and handling the conntrack's expect list.

diff-cpt-dont-cpt-requiresdev-fs-20081212

Patch from Vitaliy Gusev <vgusev@openvz.org>

Don't allow chkpnt VE with mounted ext2/ext3, etc filesystems.

Allow checkpoint only for mounted nodev and "external" filesystem.

This check protects from error on restore:

  CPT ERR: ffff810007113000,102 :-2 mounting /root/some_dir ext3 40000000

as do_one_mount() doesn't pass mntdev to mount().

[xemul: actually, the reason we don't support filesystems other than virtual and tmpfs is because we simply can't (easily) get the mount options for them to cpt and restore ]

Bug #131737.

diff-cpt-iteronemm-printk-20081119

Patch from Vasily Averin <vvs@openvz.org>
cpt: incorrect printk modificator in iter_one_mm

printk inside iter_one_mm() used "%lx" for pgprot_val(), but it is "long long" on i386 PAE kernels. The CPT_FID has the %s inside, so improper arguments lenghts can cause oops while dereferencing the string ptr.

Bug #128474.

diff-cpt-no-ipv6-sit-compile-20081031

Patch from Pavel Emelianov <xemul@openvz.org>
cpt: compilation fix for sit restoring in !IPv6 case

OpenVZ Bug #1060.

diff-cpt-open-stds-early-leak-20081128

Patch from Vitaliy Gusev <vgusev@openvz.org>
cpt: Fix leak during checkpointing overmounted /dev/null

Bug #130958.

diff-cpt-put-expect-after-insert-20081003

Patch from Vitaliy Gusev <vgusev@openvz.org>
[PATCH] CPT: put 'expect' after insert to the 'conntrack'

During restore conntrack, we need to put expect after allocating ip_conntrack_expect and do something with one. Expect will be freed or immediate (if nobody has this expect) or during cleanup/timer hooks. Otherwise expect never will be freed.

Note: Approaches for kernels 2.6.18 and 2.6.9 are different. For example see help() in "net/ipv4/netfilter/ip_conntrack_netbios_ns.c"

diff-cpt-restore-listen-inet-socket-20081013

Patch from Vitaliy Gusev <vgusev@openvz.org>
Restore information about tcp listening sockets (cpt_state == TCP_LISTEN)

Not all options are important. Only missed ipv6only can cause error if other application want to listen the same port for IPv4 any address.

tp->XXX are inherited by children (noticed by Alexey Kuznetsov), so we need also to restore these options.

Comment from Alexey:
It [everything before] was not OK. The feature which are broken are important, but not actually critical except for ipv6only.

F.e. DEFER_ACCEPT is broken -> but nobody will notice, it just will not be deferred.

diff-cpt-snmp-stats-dumping-fix-20081031

Patch from Pavel Emelianov <xemul@openvz.org>
cpt: dump udp stats and udp6, not just udp6 twice

This is actually harmless, since both stats have equal size, although somewhat incorrect result is produced on restore.

Found when compiling kernel with no IPv6 support.

OpenVZ Bug #1060.

diff-cpt-ub-resources-array-20081107

Patch from Konstantin Khlebnikov <khlebnikov@openvz.org>
cpt: restore only bc resources really presented in cpt image.

store UB_RESOURCES in cpt_beancounter_image while checkpointing. (leave all new added resources with default limits filled at bc alloc)

change cpt_content of cpt_beancounter_image to CPT_CONTENT_ARRAY to detect structure version without bumping cpt image version, because in old images __cpt_pad field (reused for cpt_ub_resources) uninitilized.

add missed error handling inside rst_undump_ubc -- toss errors from restore_one_bc to higher level.

Bug #115800.

diff-cpt-vdso-via-special-mapping-fix

Patch from Pavel Emelianov <xemul@openvz.org>
CPT: Fix VDSO page handling wrt new VDSO setup in RHEL5

The main difference is that now we have an array of whole *one* page, rather than just a virtual address. The other stuff it that the vma->vm_ops now point to vma_special_ops.

diff-fairsched-ve-sanitize-20080710

Patch from Pavel Emelianov <xemul@openvz.org>
fairsched: Sanitize fairsched manipulations on ve startup

First of all we won't be able to call them after we fix capability checks. Second of it is that taking the fairsched mutex 4 times on startup is an overkill.

diff-fs-quota-compat-proper-split-20081027

Patch from Konstantin Ozerkov <kozerkov@openvz.org>
quota: Properly split comap (i.e. v1) declarations from all the others

In short words, this patch moves CONFIG_QUOTA_COMPAT stuff from <linux/quota.h> into separate include file. This is needed for fix compilation error when CONFIG_SECCOMP option enabled (declaration cross reference).

OpenVZ Bug #972.

diff-ms-all-skbs-via-bridge-20081128

Patch from Denis Lunev <den@openvz.org>
br: do not always transmit packets to real Ethernet via bridge

Bridge in via_phys_dev mode always transmits packets via master_dev even this is not actually required as master_dev->dev_hard_xmit is called unconditinally.

This patch do a simple thing. When packet is trying to send via master_dev (first time), master_dev is replaced with bridge->dev. IMHO this approach should be used from the very beginning.

Additionally, locking on TX path is fixed. In older case we can jump inside bridge->hard_start_xmit with TX lock from actual device held.

Bug #129292.

diff-ms-backport-utimensat-peek-20081006

Patch from Konstantin Khlebnikov <khlebnikov@openvz.org>
ms: backport utimensat systemcall and machinery

Step1: steal piece of code from mainsteam (last commit 2d8f3038)

Bug #121508. OpenVZ Bug #970.

diff-ms-backport-utimensat-wire-20081006

Patch from Konstantin Khlebnikov <khlebnikov@openvz.org>
ms: backport utimensat systemcall and machinery (p3)

Step3: inject sys_utimensat into syscall tables.

Bug #121508. OpenVZ Bug #970.

diff-ms-backport-utimensat-work-up-20081006

Patch from Konstantin Khlebnikov <khlebnikov@openvz.org>
ms: backport utimensat systemcall and machinery (p2)

Step2: fixes wrt 2.6.18 kernel:

  • replace struct path usage with struct dentry and struct nameidata.
  • rename new do_utimes to __do_utimes and make it static.
  • rewrite permition checks to existent calls.

Bug #121508. OpenVZ Bug #970.

diff-ms-cpu-is-offline-20081105

Patch from Konstantin Khlebnikov <khlebnikov@openvz.org>
CPU hotplug: fix cpu_is_offline() on !CONFIG_HOTPLUG_CPU

Cherrypicked from mainstream commit a263898f (from Ingo Molnar <mingo@elte.hu>) Bug #126915.

diff-ms-missed-register_cpu_notifier-20081001

Patch from Konstantin Khlebnikov <khlebnikov@openvz.org>
[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU

Backported patch from Avi Kivity <avi@qumranet.com> (git:47e627bc)

The following patchset allows a host with running virtual machines to be suspended and, on at least a subset of the machines tested, resumed. Note that this is orthogonal to suspending and resuming an individual guest to a file.

A side effect of implementing suspend/resume is that cpu hotplug is now supported. This should please the owners of big iron.

This patch:

KVM wants the cpu hotplug notifications, both for cpu hotplug itself, but more commonly for host suspend/resume.

In order to avoid extensive #ifdefs, provide stubs when CONFIG_CPU_HOTPLUG is not defined.

In all, we have four cases:

  • UP: register and unregister stubbed out
  • SMP+hotplug: full register and unregister
  • SMP, no hotplug, core: register as __init, unregister stubbed (cpus are brought up during core initialization)
  • SMP, no hotplug, module: register and unregister stubbed out (cpus cannot be brought up during module lifetime)

Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

OpenVZ Bug #1027.

diff-ms-sles11-rtnlcompat-20081010

Patch from Marat Stanichenko <mstanichenko@openvz.org>
Return EOPNOTSUPP in case of RTM_NEWLINK.

Patch from Marat (mstanichenko@), acked-by Den (den@)
Another attempt.

The previous patch (diff-ms-rtnlcompat-20080711) doesn't fix the problem because at the end of the rtnetlink_rcv_msg() "type" is not equal to RTM_NEWLINK. It is changed at the beginning of the fuction (see "type -= RTM_BASE"). So, we must take it into account.

Bug #115250.

Moved from 028stab059.stable specs to list.

diff-ms-utimensat-compat-comp-fix-20081107

Patch from Pavel Emelianov <xemul@openvz.org>
utimes: compilation fix for x86_64 COMPAT=y case :\

diff-nfs-vzquota-warn-20081124

Patch from Denis Lunev <den@openvz.org>
nfs: warning into dmesg on vzquota/NFS server conflict

OpenVZ Bug #1086.

diff-nmi-ipi-noack-20081205

Patch from Marat Stanichenko <mstanichenko@openvz.org>
We should avoid writing to EOI register during NMI cause Intel specification declares the opposite.

Bug #132139.

diff-ubc-fs-compat-syscalls

Patch from Pavel Emelianov <xemul@openvz.org>
x86_64: Compat system calls for UBC and fairsched

Required by PSBM

Bug #131966.

diff-ubc-kmem-debug-on-comp-20081017

Patch from Konstantin Ozerkov <kozerkov@openvz.org>
ubc: Fix compilation when CONFIG_UBC_DEBUG_KMEM enabled

This patch fixes broken kernel compilation when enabled CONFIG_UBC_DEBUG_KMEM.

OpenVZ Bug #1048.

diff-ubc-swappages-resource-20081101

Patch from Konstantin Khlebnikov <khlebnikov@openvz.org>
ubc: Upgrade UB_SWAPPAGES to full-blooded resource.

The limit value will be used as configured CT swap size to show in /proc/swaps and /proc/meminfo. Default is UB_MAXVALUE

Bug #115800.

diff-ve-ban-audit-in-kconf-20081007

Patch from Pavel Emelianov <xemul@openvz.org>
audit: Ban CONFIG_AUDIT

We neither have nor want (yet) it virtualized.

diff-ve-dont-drop-audit-caps-20081007

Patch from Pavel Emelianov <xemul@openvz.org>
ve: Keep the CAP_SETVEID in container

Scaring?

That's OK - CAP_SETVEID checks are already removed.

diff-ve-mangle-mounts-devname-harder-20081106

Patch from Konstantin Khlebnikov <khlebnikov@openvz.org>
mounts: show /dev/xxx devices near ve root mounts, rather than just xxx

Required for fixing autofs in rhel5 container:

diff-ve-mangle-swapinfo-20081101

Patch from Konstantin Khlebnikov <khlebnikov@openvz.org>
ve: Fill swap size/usage with data from UB_SWAPPAGES in meminfo notifier.

Don't show swap if the limit is unlimited (default state).

Bug #115800.

diff-ve-net-drop-bind-owner-check-20081112

Patch from Denis Lunev <den@openvz.org>
ip: check for owner_env on bind bucket is extra

The reason: bind bucket carries owner_env on itself and this check has been just performed above in inet_csk_get_port. Moreover, this check is bogus as sk2 can be a timewait bucket.

This check has been already removed in netns code by Pavel.

Bug #127484.

diff-ve-new-capable-setveid-check-20081007

Patch from Pavel Emelianov <xemul@openvz.org>
ve: Don't check for CAP_SETVEID - use more ... imagination

  • This patch:

The proposed check correctly detects the root in ve0. However, we lose the ability to create containers with some fancy tool, that has the CAP_SETVEID capability only, but we don't have such.

The cap itself is declared to be obsoleted, but there's no need in rewriting vzctl in a rush - things will still work. If we'll want to manipulate audit caps from the vzctl we'll make it via features.

  • Overall history:

Don't ban CAP_AUDIT_XXX capabilities in container to make the dbus-daemon work.

After two (maybe tree) days of brain storm me and Den finally gave birth to this solution. So...

First of all AUDIT will be banned in container. Since dbus refused not to set audit caps we don't want it to mess with it in any case.

Next step is to note, that CAP_AUDIT_CONTROL coincides with the CAP_VE_ADMIN, which is not that bad (besides, dbus doesn't try to set this one up) and we leave one alone.

And finally - the CAP_AUDIT_WRITE, which coincides with the most delicate one - CAP_SETVEID. The latter one is explicitly dropped on container start and there's no way to set one (dbus tries this and fails) back. Simple "don't clear it" solution is too dangerous.

TO handle *this* case we

  1. replace all checks to capable(CAP_SETVEID) to more complicated, but still matching ve0's root only;
  2. don't ban the CAP_SETVEID (== CAP_AUDIT_WRITE == the_one_dbus_needs);
  3. remember, that this capability is present on ve startup and thus we automatically have the CAP_AUDIT_WRITE required by dbus;
  4. carefully handle the case, when we enter container in do_env_create and try to call fairsched system calls.

That's it. No fraud, just manual dexterity  ;)

Bug #117448.

diff-ve-nfct-netlink-oops-if-unconfigured-20081124

Patch from Vitaliy Gusev <vgusev@openvz.org>
Fix NULL dereference virtualized ip_nat variables via netlink

If VE is allowed to contrack but is not allowed to ip_nat and ip_conntrack_netlink is loaded then user from VE can hang host: First Ooops in ip_nat_core.c:ip_nat_proto_find_get, second in ip_nat_core.c:find_appropriate_src() with host going to panic as read_lock_bh is held:

Unable to handle kernel NULL pointer dereference at 0000000000000030 RIP:
  [<ffffffff881636c1>] :ip_nat:ip_nat_proto_find_get+0x61/0xa0
Process lt-ctnl_test (pid: 10587, veid=1000, threadinfo ffff81000b8da000, task ffff810005e87040)
Stack:  ffff81000fb001f8 ffff810015f2fe98 ffff81000b8db888 ffffffff8819a362
  0000000000000000 0000000000000000 ffff81000b8db8a8 ffff81000fb001f8
  ffff81000b8dba48 ffff81000b8dba20 ffff81000b8db908 ffffffff8819a6f9
Call Trace:
  [<ffffffff8819a362>] :ip_conntrack_netlink:ctnetlink_parse_nat_proto+0x92/0xe0
  [<ffffffff8819a6f9>] :ip_conntrack_netlink:ctnetlink_create_conntrack+0x349/0x4e0
  [<ffffffff8819bcf7>] :ip_conntrack_netlink:ctnetlink_new_conntrack+0x367/0x9c0
  [<ffffffff8819bd28>] :ip_conntrack_netlink:ctnetlink_new_conntrack+0x398/0x9c0
  [<ffffffff8106061f>] __lock_acquire+0xcff/0xd50
  [<ffffffff8812d52b>] :nfnetlink:nfnetlink_rcv_msg+0x20b/0x230
  [<ffffffff8812d350>] :nfnetlink:nfnetlink_rcv_msg+0x30/0x230
  [<ffffffff8812d5c0>] :nfnetlink:nfnetlink_rcv+0x70/0x174
  [<ffffffff811fefaa>] netlink_data_ready+0x1a/0x60
  [<ffffffff811ffa3b>] netlink_sendmsg+0x51b/0x560
  [<ffffffff8102be10>] default_wake_function+0x0/0x10
  [<ffffffff811e1a5e>] sock_sendmsg+0xee/0x110
  [<ffffffff8104e9f0>] autoremove_wake_function+0x0/0x40
  [<ffffffff81253f29>] _spin_unlock_irqrestore+0x49/0x60
  [<ffffffff8105f33c>] mark_held_locks+0x7c/0xb0
  [<ffffffff8106061f>] __lock_acquire+0xcff/0xd50
  [<ffffffff811e1845>] move_addr_to_kernel+0x25/0x40
  [<ffffffff811ea714>] verify_iovec+0x54/0xb0
  [<ffffffff811e26a6>] sys_sendmsg+0x246/0x2c0
  [<ffffffff8111300b>] __up_read+0x9b/0xb0
  [<ffffffff81051cf6>] up_read+0x26/0x30
  [<ffffffff8101e791>] do_page_fault+0x4e1/0x8e0
  [<ffffffff81250e5b>] thread_return+0x98/0x1cd
  [<ffffffff8105f54b>] trace_hardirqs_on+0x11b/0x160
  [<ffffffff81250e5b>] thread_return+0x98/0x1cd
  [<ffffffff8105f54b>] trace_hardirqs_on+0x11b/0x160
  [<ffffffff812534d3>] trace_hardirqs_on_thunk+0x35/0x37
  [<ffffffff8100a006>] system_call+0x7e/0x83

Bug #127153.

diff-ve-nfs-lockd-stop-fix-hosts-count-20081124

Patch from Denis Lunev <den@openvz.org>
lockd: do not attempt to shutdown lockd hosts from other environments

This codepath is invoked during lockd stop which, in turn, is per/VE. The consequence is simple and bad - timeout on RPC operations. User visible consequence is the following message in dmesg:

lockd: couldn't shutdown host module!

Bug #126918.

diff-ve-pi-futex-use-vpid-20081212

Patch from Marat Stanichenko <mstanichenko@openvz.org>
ve: Use vpid in pi_futex code.

As we use tasks' vpid to own pi futex we should do it everywhere.

Bug #132768.

diff-ve-printk-lockdep-fixup-20081120

Patch from Vitaliy Gusev <vgusev@openvz.org>
printk: fix lockdep warnings if kernel compiled with CONFIG_LOCKDEP

vprintk() to VE causes:

   =====================================
   [ BUG: lock held at task exit time! ]
   -------------------------------------
   iptables/8203 is exiting with locks still held!
   1 lock held by iptables/8203:
    #0: (sk_lock-AF_INET){--..}, at: [<ffffffff81213341>] ip_setsockopt+0x61/0xa0
   
   stack backtrace:
   
   Call Trace:
    [<ffffffff8100b78a>] show_trace+0xca/0x3b0
    [<ffffffff8100ba85>] dump_stack+0x15/0x20
    [<ffffffff8105e469>] debug_check_no_locks_held+0x89/0xa0
    [<ffffffff8103aa7e>] do_exit+0xe2e/0xe80
    [<ffffffff8103aba0>] sys_exit_group+0x0/0x20
    [<0000000000000001>]

Note: to reproduce this you can type in VE:

    iptables -A INPUT -m tcp --dport 22 -j DROP

diff-ve-show-proc-swaps-in-ct-20081101

Patch from Konstantin Khlebnikov <khlebnikov@openvz.org>
ve: Add /proc/swaps file inside CT.

Fill the size/used values with the ones from the meminfo virtinfo notifier.

Show one fake swap partition (/dev/null) with the same size/used as in /proc/meminfo. If --meminfo == none show overall swap statisctics from HN.

Bug #115800.

diff-vzdq-qmblk-dq_sem-to-mutex-20081114

Patch from Konstantin Ozerkov <kozerkov@openvz.org>
vzquota: replace quota master block semaphore with mutex

Bug #120822.

diff-vzdq-vz_quota_sem-to-mutex-20081114

Patch from Konstantin Ozerkov <kozerkov@openvz.org>
vzquota: replace master lock semaphore with mutex

Bug #120822.