Changes

Jump to: navigation, search

Download/kernel/rhel5/028stab060.2/changes

19,965 bytes added, 15:00, 22 January 2009
created
== Changes ==
Since {{kernel link|rhel5|028stab059.6}}:
* Rebased on 2.6.18-92.1.18 RHEL5 update ({{RHSA|2008-0957}})
* Backported ''some'' patches from RHEL5 update 92.1.22 ({{RHSA|2008-1017}})
* Fixed utimensat system call ({{Bug|970}})
* Fixed <code>CAP_AUDIT</code> capability in CT (for dbus)
* Added <code>UB_SWAPINFO</code> resource (for Oracle in CTs, needs vzctl >= 3.0.24)
* NFS deadlocks fixed
* Many small fixes in CPT code

=== Configs ===
Same as in {{kernel link|rhel5|028stab059.6}}, plus:
* +<code>CONFIG_FB_EFI=y</code>

=== Compatibility ===
No new issues.
<includeonly>[[{{PAGENAME}}/changes#Patches|{{Long changelog message}}]]</includeonly><noinclude>
=== Patches ===

==== Ported from RHEL5 2.6.18-92.1.22.el5 kernel ====
* linux-2.6-nfs-v4-credential-ref-leak-in-nfs4_get_state_owner.patch
* linux-2.6-net-ipv4-fix-byte-value-boundary-check.patch
* linux-2.6-fs-don-t-allow-splice-to-files-opened-with-o_append.patch
* linux-2.6-drm-i915-driver-arbitrary-ioremap.patch

==== diff-cpt-conntracks-fix-used-count-20081001 ====
<div class="change">
Patch from Vitaliy Gusev <vgusev@openvz.org><br/>
[PATCH] CPT: Fix ip_conntrack_ftp usage counter leak

Function ip_conntrack_helper_find_get() gets module counter. So put a
conntrack after putting in the hash and handling the conntrack's expect
list.
</div>

==== diff-cpt-dont-cpt-requiresdev-fs-20081212 ====
<div class="change">
Patch from Vitaliy Gusev <vgusev@openvz.org><br/>

Don't allow chkpnt VE with mounted ext2/ext3, etc filesystems.

Allow checkpoint only for mounted nodev and "external" filesystem.

This check protects from error on restore:
CPT ERR: ffff810007113000,102 :-2 mounting /root/some_dir ext3 40000000

as do_one_mount() doesn't pass mntdev to mount().

[xemul: actually, the reason we don't support filesystems other than
virtual and tmpfs is because we simply can't (easily) get the
mount options for them to cpt and restore ]

Bug #131737.
</div>

==== diff-cpt-iteronemm-printk-20081119 ====
<div class="change">
Patch from Vasily Averin <vvs@openvz.org><br/>
cpt: incorrect printk modificator in iter_one_mm

printk inside iter_one_mm() used "%lx" for pgprot_val(), but it is "long long"
on i386 PAE kernels. The CPT_FID has the %s inside, so improper arguments lenghts
can cause oops while dereferencing the string ptr.

Bug #128474.
</div>
==== diff-cpt-no-ipv6-sit-compile-20081031 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org><br/>
cpt: compilation fix for sit restoring in !IPv6 case

{{Bug|1060}}.
</div>

==== diff-cpt-open-stds-early-leak-20081128 ====
<div class="change">
Patch from Vitaliy Gusev <vgusev@openvz.org><br/>
cpt: Fix leak during checkpointing overmounted /dev/null

Bug #130958.
</div>

==== diff-cpt-put-expect-after-insert-20081003 ====
<div class="change">
Patch from Vitaliy Gusev <vgusev@openvz.org><br/>
[PATCH] CPT: put 'expect' after insert to the 'conntrack'

During restore conntrack, we need to put expect after allocating
ip_conntrack_expect and do something with one. Expect will be
freed or immediate (if nobody has this expect) or during cleanup/timer
hooks. Otherwise expect never will be freed.

Note: Approaches for kernels 2.6.18 and 2.6.9 are different. For example
see help() in "net/ipv4/netfilter/ip_conntrack_netbios_ns.c"
</div>

==== diff-cpt-restore-listen-inet-socket-20081013 ====
<div class="change">
Patch from Vitaliy Gusev <vgusev@openvz.org><br/>
Restore information about tcp listening sockets (cpt_state == TCP_LISTEN)

Not all options are important. Only missed ipv6only can cause
error if other application want to listen the same port for IPv4 any address.

tp->XXX are inherited by children (noticed by Alexey Kuznetsov), so we need also
to restore these options.

Comment from Alexey:<br/>
It [everything before] was not OK. The feature which are broken are important,
but not actually critical except for ipv6only.

F.e. DEFER_ACCEPT is broken -> but nobody will notice, it just will not
be deferred.
</div>

==== diff-cpt-snmp-stats-dumping-fix-20081031 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org><br/>
cpt: dump udp stats and udp6, not just udp6 twice

This is actually harmless, since both stats have equal size,
although somewhat incorrect result is produced on restore.

Found when compiling kernel with no IPv6 support.

{{Bug|1060}}.
</div>

==== diff-cpt-ub-resources-array-20081107 ====
<div class="change">
Patch from Konstantin Khlebnikov <khlebnikov@openvz.org><br/>
cpt: restore only bc resources really presented in cpt image.

store UB_RESOURCES in cpt_beancounter_image while checkpointing.
(leave all new added resources with default limits filled at bc alloc)

change cpt_content of cpt_beancounter_image to CPT_CONTENT_ARRAY to detect
structure version without bumping cpt image version, because in old images
__cpt_pad field (reused for cpt_ub_resources) uninitilized.

add missed error handling inside rst_undump_ubc -- toss errors
from restore_one_bc to higher level.

Bug #115800.
</div>

==== diff-cpt-vdso-via-special-mapping-fix ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org><br/>
CPT: Fix VDSO page handling wrt new VDSO setup in RHEL5

The main difference is that now we have an array of whole *one*
page, rather than just a virtual address. The other stuff it that
the vma->vm_ops now point to vma_special_ops.
</div>

==== diff-fairsched-ve-sanitize-20080710 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org><br/>
fairsched: Sanitize fairsched manipulations on ve startup

First of all we won't be able to call them after we fix
capability checks. Second of it is that taking the fairsched
mutex 4 times on startup is an overkill.
</div>

==== diff-fs-quota-compat-proper-split-20081027 ====
<div class="change">
Patch from Konstantin Ozerkov <kozerkov@openvz.org><br/>
quota: Properly split comap (i.e. v1) declarations from all the others

In short words, this patch moves CONFIG_QUOTA_COMPAT stuff from
<linux/quota.h> into separate include file. This is needed for fix
compilation error when CONFIG_SECCOMP option enabled (declaration
cross reference).

{{Bug|972}}.
</div>

==== diff-ms-all-skbs-via-bridge-20081128 ====
<div class="change">
Patch from Denis Lunev <den@openvz.org><br/>
br: do not always transmit packets to real Ethernet via bridge

Bridge in via_phys_dev mode always transmits packets via master_dev even
this is not actually required as master_dev->dev_hard_xmit is called
unconditinally.

This patch do a simple thing. When packet is trying to send via
master_dev (first time), master_dev is replaced with bridge->dev.
IMHO this approach should be used from the very beginning.

Additionally, locking on TX path is fixed. In older case we can jump
inside bridge->hard_start_xmit with TX lock from actual device held.

Bug #129292.
</div>

==== diff-ms-backport-utimensat-peek-20081006 ====
<div class="change">
Patch from Konstantin Khlebnikov <khlebnikov@openvz.org><br/>
ms: backport utimensat systemcall and machinery

Step1: steal piece of code from mainsteam (last commit 2d8f3038)

Bug #121508.
{{Bug|970}}.
</div>

==== diff-ms-backport-utimensat-wire-20081006 ====
<div class="change">
Patch from Konstantin Khlebnikov <khlebnikov@openvz.org><br/>
ms: backport utimensat systemcall and machinery (p3)

Step3: inject sys_utimensat into syscall tables.

Bug #121508.
{{Bug|970}}.
</div>

==== diff-ms-backport-utimensat-work-up-20081006 ====
<div class="change">
Patch from Konstantin Khlebnikov <khlebnikov@openvz.org><br/>
ms: backport utimensat systemcall and machinery (p2)

Step2: fixes wrt 2.6.18 kernel:
* replace struct path usage with struct dentry and struct nameidata.
* rename new do_utimes to __do_utimes and make it static.
* rewrite permition checks to existent calls.

Bug #121508.
{{Bug|970}}.
</div>

==== diff-ms-cpu-is-offline-20081105 ====
<div class="change">
Patch from Konstantin Khlebnikov <khlebnikov@openvz.org><br/>
CPU hotplug: fix cpu_is_offline() on !CONFIG_HOTPLUG_CPU

Cherrypicked from mainstream commit a263898f (from Ingo Molnar <mingo@elte.hu>)
Bug #126915.
</div>

==== diff-ms-missed-register_cpu_notifier-20081001 ====
<div class="change">
Patch from Konstantin Khlebnikov <khlebnikov@openvz.org><br/>
[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU

Backported patch from Avi Kivity <avi@qumranet.com> (git:47e627bc)

The following patchset allows a host with running virtual machines to be
suspended and, on at least a subset of the machines tested, resumed. Note
that this is orthogonal to suspending and resuming an individual guest to a
file.

A side effect of implementing suspend/resume is that cpu hotplug is now
supported. This should please the owners of big iron.

This patch:

KVM wants the cpu hotplug notifications, both for cpu hotplug itself, but more
commonly for host suspend/resume.

In order to avoid extensive #ifdefs, provide stubs when CONFIG_CPU_HOTPLUG is
not defined.

In all, we have four cases:

* UP: register and unregister stubbed out
* SMP+hotplug: full register and unregister
* SMP, no hotplug, core: register as __init, unregister stubbed (cpus are brought up during core initialization)
* SMP, no hotplug, module: register and unregister stubbed out (cpus cannot be brought up during module lifetime)

Signed-off-by: Avi Kivity <avi@qumranet.com><br/>
Cc: Ingo Molnar <mingo@elte.hu><br/>
Cc: Rusty Russell <rusty@rustcorp.com.au><br/>
Cc: Oleg Nesterov <oleg@tv-sign.ru><br/>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org><br/>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

{{Bug|1027}}.
</div>

==== diff-ms-sles11-rtnlcompat-20081010 ====
<div class="change">
Patch from Marat Stanichenko <mstanichenko@openvz.org><br/>
Return EOPNOTSUPP in case of RTM_NEWLINK.

Patch from Marat (mstanichenko@), acked-by Den (den@)<br/>
Another attempt.

The previous patch (diff-ms-rtnlcompat-20080711) doesn't fix the problem
because at the end of the rtnetlink_rcv_msg() "type" is not equal to
RTM_NEWLINK. It is changed at the beginning of the fuction (see "type -=
RTM_BASE"). So, we must take it into account.

Bug #115250.

Moved from 028stab059.stable specs to list.
</div>

==== diff-ms-utimensat-compat-comp-fix-20081107 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org><br/>
utimes: compilation fix for x86_64 COMPAT=y case :\
</div>

==== diff-nfs-vzquota-warn-20081124 ====
<div class="change">
Patch from Denis Lunev <den@openvz.org><br/>
nfs: warning into dmesg on vzquota/NFS server conflict

{{Bug|1086}}.
</div>

==== diff-nmi-ipi-noack-20081205 ====
<div class="change">
Patch from Marat Stanichenko <mstanichenko@openvz.org><br/>
We should avoid writing to EOI register during NMI cause Intel
specification declares the opposite.

Bug #132139.
</div>

==== diff-ubc-fs-compat-syscalls ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org><br/>
x86_64: Compat system calls for UBC and fairsched

Required by PSBM

Bug #131966.
</div>

==== diff-ubc-kmem-debug-on-comp-20081017 ====
<div class="change">
Patch from Konstantin Ozerkov <kozerkov@openvz.org><br/>
ubc: Fix compilation when CONFIG_UBC_DEBUG_KMEM enabled

This patch fixes broken kernel compilation when enabled CONFIG_UBC_DEBUG_KMEM.

{{Bug|1048}}.
</div>

==== diff-ubc-swappages-resource-20081101 ====
<div class="change">
Patch from Konstantin Khlebnikov <khlebnikov@openvz.org><br/>
ubc: Upgrade UB_SWAPPAGES to full-blooded resource.

The limit value will be used as configured CT swap size to show
in /proc/swaps and /proc/meminfo. Default is UB_MAXVALUE

Bug #115800.
</div>

==== diff-ve-ban-audit-in-kconf-20081007 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org><br/>
audit: Ban CONFIG_AUDIT

We neither have nor want (yet) it virtualized.
</div>

==== diff-ve-dont-drop-audit-caps-20081007 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org><br/>
ve: Keep the CAP_SETVEID in container

Scaring?

That's OK - CAP_SETVEID checks are already removed.
</div>

==== diff-ve-mangle-mounts-devname-harder-20081106 ====
<div class="change">
Patch from Konstantin Khlebnikov <khlebnikov@openvz.org><br/>
mounts: show /dev/xxx devices near ve root mounts, rather than just xxx

Required for fixing autofs in rhel5 container:
</div>

==== diff-ve-mangle-swapinfo-20081101 ====
<div class="change">
Patch from Konstantin Khlebnikov <khlebnikov@openvz.org><br/>
ve: Fill swap size/usage with data from UB_SWAPPAGES in meminfo notifier.

Don't show swap if the limit is unlimited (default state).

Bug #115800.
</div>

==== diff-ve-net-drop-bind-owner-check-20081112 ====
<div class="change">
Patch from Denis Lunev <den@openvz.org><br/>
ip: check for owner_env on bind bucket is extra

The reason: bind bucket carries owner_env on itself and this check has
been just performed above in inet_csk_get_port. Moreover, this check is
bogus as sk2 can be a timewait bucket.

This check has been already removed in netns code by Pavel.

Bug #127484.
</div>

==== diff-ve-new-capable-setveid-check-20081007 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org><br/>
ve: Don't check for CAP_SETVEID - use more ... imagination

* This patch:

The proposed check correctly detects the root in ve0.
However, we lose the ability to create containers with
some fancy tool, that has the CAP_SETVEID capability
only, but we don't have such.

The cap itself is declared to be obsoleted, but there's
no need in rewriting vzctl in a rush - things will still
work. If we'll want to manipulate audit caps from the
vzctl we'll make it via features.

* Overall history:

Don't ban CAP_AUDIT_XXX capabilities in container to make the
dbus-daemon work.

After two (maybe tree) days of brain storm me and Den finally
gave birth to this solution. So...

First of all AUDIT will be banned in container. Since dbus refused
not to set audit caps we don't want it to mess with it in any case.

Next step is to note, that CAP_AUDIT_CONTROL coincides with the
CAP_VE_ADMIN, which is not that bad (besides, dbus doesn't try to
set this one up) and we leave one alone.

And finally - the CAP_AUDIT_WRITE, which coincides with the most
delicate one - CAP_SETVEID. The latter one is explicitly dropped
on container start and there's no way to set one (dbus tries this
and fails) back. Simple "don't clear it" solution is too dangerous.

TO handle *this* case we
# replace all checks to capable(CAP_SETVEID) to more complicated, but still matching ve0's root only;
# don't ban the CAP_SETVEID (== CAP_AUDIT_WRITE == the_one_dbus_needs);
# remember, that this capability is present on ve startup and thus we automatically have the CAP_AUDIT_WRITE required by dbus;
# carefully handle the case, when we enter container in do_env_create and try to call fairsched system calls.

That's it. No fraud, just manual dexterity ;)

Bug #117448.
</div>

==== diff-ve-nfct-netlink-oops-if-unconfigured-20081124 ====
<div class="change">
Patch from Vitaliy Gusev <vgusev@openvz.org><br/>
Fix NULL dereference virtualized ip_nat variables via netlink

If VE is allowed to contrack but is not allowed to ip_nat and
ip_conntrack_netlink is loaded then user from VE can hang host:
First Ooops in ip_nat_core.c:ip_nat_proto_find_get, second in
ip_nat_core.c:find_appropriate_src() with host going to panic as
read_lock_bh is held:

Unable to handle kernel NULL pointer dereference at 0000000000000030 RIP:
[<ffffffff881636c1>] :ip_nat:ip_nat_proto_find_get+0x61/0xa0
Process lt-ctnl_test (pid: 10587, veid=1000, threadinfo ffff81000b8da000, task ffff810005e87040)
Stack: ffff81000fb001f8 ffff810015f2fe98 ffff81000b8db888 ffffffff8819a362
0000000000000000 0000000000000000 ffff81000b8db8a8 ffff81000fb001f8
ffff81000b8dba48 ffff81000b8dba20 ffff81000b8db908 ffffffff8819a6f9
Call Trace:
[<ffffffff8819a362>] :ip_conntrack_netlink:ctnetlink_parse_nat_proto+0x92/0xe0
[<ffffffff8819a6f9>] :ip_conntrack_netlink:ctnetlink_create_conntrack+0x349/0x4e0
[<ffffffff8819bcf7>] :ip_conntrack_netlink:ctnetlink_new_conntrack+0x367/0x9c0
[<ffffffff8819bd28>] :ip_conntrack_netlink:ctnetlink_new_conntrack+0x398/0x9c0
[<ffffffff8106061f>] __lock_acquire+0xcff/0xd50
[<ffffffff8812d52b>] :nfnetlink:nfnetlink_rcv_msg+0x20b/0x230
[<ffffffff8812d350>] :nfnetlink:nfnetlink_rcv_msg+0x30/0x230
[<ffffffff8812d5c0>] :nfnetlink:nfnetlink_rcv+0x70/0x174
[<ffffffff811fefaa>] netlink_data_ready+0x1a/0x60
[<ffffffff811ffa3b>] netlink_sendmsg+0x51b/0x560
[<ffffffff8102be10>] default_wake_function+0x0/0x10
[<ffffffff811e1a5e>] sock_sendmsg+0xee/0x110
[<ffffffff8104e9f0>] autoremove_wake_function+0x0/0x40
[<ffffffff81253f29>] _spin_unlock_irqrestore+0x49/0x60
[<ffffffff8105f33c>] mark_held_locks+0x7c/0xb0
[<ffffffff8106061f>] __lock_acquire+0xcff/0xd50
[<ffffffff811e1845>] move_addr_to_kernel+0x25/0x40
[<ffffffff811ea714>] verify_iovec+0x54/0xb0
[<ffffffff811e26a6>] sys_sendmsg+0x246/0x2c0
[<ffffffff8111300b>] __up_read+0x9b/0xb0
[<ffffffff81051cf6>] up_read+0x26/0x30
[<ffffffff8101e791>] do_page_fault+0x4e1/0x8e0
[<ffffffff81250e5b>] thread_return+0x98/0x1cd
[<ffffffff8105f54b>] trace_hardirqs_on+0x11b/0x160
[<ffffffff81250e5b>] thread_return+0x98/0x1cd
[<ffffffff8105f54b>] trace_hardirqs_on+0x11b/0x160
[<ffffffff812534d3>] trace_hardirqs_on_thunk+0x35/0x37
[<ffffffff8100a006>] system_call+0x7e/0x83

Bug #127153.
</div>

==== diff-ve-nfs-lockd-stop-fix-hosts-count-20081124 ====
<div class="change">
Patch from Denis Lunev <den@openvz.org><br/>
lockd: do not attempt to shutdown lockd hosts from other environments

This codepath is invoked during lockd stop which, in turn, is per/VE.
The consequence is simple and bad - timeout on RPC operations. User
visible consequence is the following message in dmesg:
lockd: couldn't shutdown host module!

Bug #126918.
</div>

==== diff-ve-pi-futex-use-vpid-20081212 ====
<div class="change">
Patch from Marat Stanichenko <mstanichenko@openvz.org><br/>
ve: Use vpid in pi_futex code.

As we use tasks' vpid to own pi futex we should do it
everywhere.

Bug #132768.
</div>

==== diff-ve-printk-lockdep-fixup-20081120 ====
<div class="change">
Patch from Vitaliy Gusev <vgusev@openvz.org><br/>
printk: fix lockdep warnings if kernel compiled with CONFIG_LOCKDEP

vprintk() to VE causes:

=====================================
[ BUG: lock held at task exit time! ]
-------------------------------------
iptables/8203 is exiting with locks still held!
1 lock held by iptables/8203:
#0: (sk_lock-AF_INET){--..}, at: [<ffffffff81213341>] ip_setsockopt+0x61/0xa0

stack backtrace:

Call Trace:
[<ffffffff8100b78a>] show_trace+0xca/0x3b0
[<ffffffff8100ba85>] dump_stack+0x15/0x20
[<ffffffff8105e469>] debug_check_no_locks_held+0x89/0xa0
[<ffffffff8103aa7e>] do_exit+0xe2e/0xe80
[<ffffffff8103aba0>] sys_exit_group+0x0/0x20
[<0000000000000001>]

Note: to reproduce this you can type in VE:
iptables -A INPUT -m tcp --dport 22 -j DROP
</div>

==== diff-ve-show-proc-swaps-in-ct-20081101 ====
<div class="change">
Patch from Konstantin Khlebnikov <khlebnikov@openvz.org><br/>
ve: Add /proc/swaps file inside CT.

Fill the size/used values with the ones from the meminfo virtinfo notifier.

Show one fake swap partition (/dev/null) with the same size/used as in
/proc/meminfo. If --meminfo == none show overall swap statisctics from HN.

Bug #115800.
</div>

==== diff-vzdq-qmblk-dq_sem-to-mutex-20081114 ====
<div class="change">
Patch from Konstantin Ozerkov <kozerkov@openvz.org><br/>
vzquota: replace quota master block semaphore with mutex

Bug #120822.
</div>

==== diff-vzdq-vz_quota_sem-to-mutex-20081114 ====
<div class="change">
Patch from Konstantin Ozerkov <kozerkov@openvz.org><br/>
vzquota: replace master lock semaphore with mutex

Bug #120822.
</div>

Navigation menu