Difference between revisions of "Download/kernel/rhel5/028stab031.1/changes"

From OpenVZ Virtuozzo Containers Wiki
Jump to: navigation, search
(-4)
m (Protected "Download/kernel/rhel5/028stab031.1/changes": Robot: Protecting a list of files. [edit=autoconfirmed:move=autoconfirmed])
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
The fact that teachers need better  training to carry out deliberate instruction in reading, spelling, and writing should prompt action rather than criticism. ,
+
== Changes ==
 +
* Mainstream security fix in netlink.
 +
* Fixes in checkpointing, CPU fair scheduler, VE I/O scheduling, beancounters, VE disk quotas, etc.
 +
 
 +
=== Config changes ===
 +
Added:
 +
* +<code>CONFIG_IKCONFIG=y</code>
 +
* +<code>CONFIG_IKCONFIG_PROC=y</code>
 +
* +<code>CONFIG_KEXEC=y</code> (i686-PAE only)
 +
* +<code>CONFIG_CRASH_DUMP=y</code> (i686-PAE only)
 +
* +<code>CONFIG_PROC_VMCORE=y</code> (i686-PAE only)
 +
Removed:
 +
* -<code>CONFIG_4KSTACKS=y</code>
 +
<includeonly>[[{{PAGENAME}}/changes#Patches|{{Long changelog message}}]]</includeonly><noinclude>
 +
=== Patches ===
 +
 
 +
==== diff-arch-4gb-20070409 ====
 +
<div class="change">
 +
Patch from Kirill Korotaev &lt;dev@openvz.org&gt;:<br/>
 +
[4G/4G] i686: 4GB split patch
 +
 
 +
4G split patch based on linux-2.6.0-4g4g.patch from RHEL4 from Ingo Molnar.
 +
 
 +
Changes:
 +
* fixed reserved pages handling
 +
* fixed PI futexes direct user access atomic operations
 +
* fixed lockdep
 +
* fixed entry.S to handle multiple irets and 16bit stacks, NMI support
 +
* reworked hw breakpoints support as in my patch in 2.4.x
 +
* fixed huge TSS and io bitmaps
 +
* added PTL locks support
 +
* fixed kprobes
 +
* 4-level page tables
 +
* lots of fixes merged from OVZ 2.6.9 (see below)
 +
 
 +
Merged:
 +
; linux-2.6.9-4g4g-hugemem-warning.patch
 +
:
 +
; linux-2.6.9-4g4g-noncachable.patch
 +
:
 +
; diff-arch-4gb-swsuspd
 +
: Fixed broken software suspend
 +
; diff-arch-4gb-fixaddr2
 +
: Fixed VSYSCALL_BASE/FIXADDR_TOP interrelations (8k stacks)
 +
; diff-arch-4gb-stksize
 +
: Fixed virtual stack mappings for 8k stacks
 +
; diff-arch-4gb-copyusr
 +
: Optimization in filemap.c on copy_XXX_user
 +
; diff-arch-4gb-vmalloc2
 +
: vmalloc area extending up to 256Mb
 +
; diff-arch-4gb-tasksize
 +
: Limiting user address space to 3GB
 +
; diff-arch-4gb-tssmaps
 +
: Fixed TSS mapings for online_cpus &gt; 2 and big TSS
 +
; diff-arch-4gb-ldtleak
 +
: Fixed leak of LDT pages on error path
 +
; diff-arch-4gb-gcc296
 +
: Fixed get_user() compilation bug on GCC 2.96
 +
; diff-arch-4gb-amd-prefetch
 +
: Fixed prefetch code detection on AMD
 +
; diff-arch-4gb-mce-20060824
 +
: Fixed MCE handling when 4GB split is on
 +
 
 +
Not needed (already in):
 +
* linux-2.6.9-net-b44-4g4g.patch
 +
* linux-2.6.9-4g4g-maxtasksize.patch
 +
* diff-arch-4gb-pgdctor
 +
* diff-arch-4gb-getname
 +
</div>
 +
 
 +
==== diff-ms-lockdep-neighbour-table-class-20070409 ====
 +
<div class="change">
 +
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
 +
 
 +
[LOCKDEP] Fix wrong deadlock report in neigh table
 +
 
 +
Lockdep detects a fake deadlock in the calltrace:
 +
<pre>
 +
neigh_proxy_process()
 +
  `- lock(neigh_table-&gt;proxy_queue.lock);
 +
arp_process (tbl-&gt;proxy_redo)
 +
neigh_event_ns
 +
neigh_update
 +
skb_purge_queue
 +
  `- lock(neighbour-&gt;arp_queue.lock);
 +
</pre>
 +
 
 +
Actually there is no deadlock as the first lock and the second
 +
one are different skb_buff_head's locks, but they are initialized
 +
both in skb_queue_head_init() and thus have on lockdep-class.
 +
 
 +
This is a mainstream "BUG".
 +
Fixed by adding another class to neigh_table's proxy_queue lock.
 +
 
 +
Bug #78837.
 +
 
 +
(Tested with node bootup, VE start and vzt-prep vzt-ss).
 +
</div>
 +
 
 +
==== diff-ubc-page-uncharge-bug-first-20070406 ====
 +
<div class="change">
 +
 
 +
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
 +
[BC] Check correct user_beancounter passed first in ub_page_uncharge()
 +
 
 +
If page accidentally has a not-removed page_beancounter kernel will
 +
oops dereferencing ub-&gt;ub_percpu(). Move the BUG_ON upper to be sure
 +
we work with user_beancounter.
 +
 
 +
This might be the lost in bug #78461.
 +
</div>
 +
 
 +
==== diff-ubc-refcount-leak-dupmm-20070409 ====
 +
<div class="change">
 +
Patch from Alexey Dobriyan &lt;adobriyan@openvz.org&gt;:<br/>
 +
[BC] refcount leak in dup_mm() on error path
 +
 
 +
Fix simple beancounter refcount leak on error path
 +
in dup_mm().
 +
 
 +
Bug #77231.
 +
</div>
 +
 
 +
==== diff-ubc-vmguar-enough-null-mm-20070406 ====
 +
<div class="change">
 +
Patch from Vasily Tarasov &lt;vtaras@openvz.org&gt;:<br/>
 +
[BC] vmguar_enough_memory() oopses if called form kernel thread
 +
 
 +
If vmguar_enough_memory() function is called by kernel thread, it oopses
 +
due to task_struct-&gt;mm equals NULL. Such situation was encountered when
 +
aufs was over ramfs.
 +
 
 +
</div>
 +
 
 +
==== diff-cpt-make-zombie-20070413 ====
 +
<div class="change">
 +
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
 +
[CPT] alternative way to migrate zombie processes
 +
 
 +
In older 2.6.8 kernels do_exit() was very simple, essentially
 +
it disposed m etc, which is done automatically while checkpointing,
 +
and did some work on notifying parent. So that it was natural
 +
to move restored process to zombie state by hands.
 +
 
 +
In 2.6.18 do_exit makes _lots_ of work.
 +
 
 +
Seems, it is easier to invert logic. We introduce new flag
 +
PF_RESTART_EXIT, which suppresses the work which was already done,
 +
when process at source hardware node moved to zombie state
 +
(mostly, sending signals) and use do_exit() when restoring zombie processes.
 +
 
 +
Also, the same patch adds checks for a few of new things, which
 +
cannot be migrated, it is related because the list of those things
 +
obtained from do_exit().
 +
</div>
 +
 
 +
==== diff-cpt-off-lockdep-on-sockets-20070413 ====
 +
<div class="change">
 +
 
 +
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
 +
[CPT] Fix lockdep warning on socket dump
 +
 
 +
CPT locks all the sockets it finds for dumping.
 +
This is OK, but lockdep thinks as if it were a circular locking.
 +
 
 +
It happens each time we migrate a VE with more than
 +
one socked aboard.
 +
 
 +
Bug #79164.
 +
 
 +
</div>
 +
 
 +
==== diff-cpt-robust-list-20070413 ====
 +
<div class="change">
 +
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
 +
[CPT] checkpointing robust lists
 +
 
 +
Otherwise we are going to have problems with migration
 +
of newer glibcs using robust lists when this is possible.
 +
 
 +
</div>
 +
 
 +
==== diff-fairsched-best-vcpu-20070413 ====
 +
<div class="change">
 +
Patch from Alexandr Andreev &lt;aandreev@openvz.org&gt;:<br/>
 +
[SCHED] Reduce starvation of some VCPUs in case of cpu limits
 +
 
 +
Change logic of choosing best_vcpu to schedule to.
 +
There are two potential problems:
 +
 +
a) if a vcpu is hot, and last used physical CPU of this vcpu is equal to
 +
smp_processor_id() it will be always chosen. This is not a good
 +
decision, because there is no guarantee, that _all_ physical CPU's must
 +
take vcpu's from a vsched. For example, if cpulimit for a vsched is
 +
small, this vsched can be run only on one physical CPU forever.
 +
 
 +
b) Also now newer 'cold' vcpu's are chosen first,
 +
because we scan active_list in direct way,
 +
i.e. from older vcpus to newer vcpus, and a newer one will be chosen finally.
 +
In this case old vcpu's can starve for a long time
 +
 
 +
Bug #79015.
 +
</div>
 +
 
 +
==== diff-fairsched-find-idle-vcpu-20070403 ====
 +
<div class="change">
 +
 
 +
Patch from Alexandr Andreev &lt;aandreev@openvz.org&gt;:<br/>
 +
[SCHED] find_idle_vcpu() mask check fix
 +
 
 +
In find_idle_vcpu() we skip VCPU's with ID's that is not
 +
set in physical '*cpus' mask. It's incorrect.
 +
We must skip VCPU's that has appropriate VCPU-&gt;last_pcpu
 +
</div>
 +
 
 +
==== diff-ms-bridge-unaligned-access-20070411 ====
 +
 
 +
<div class="change">
 +
Patch from Evgeny Kravtsunov &lt;emkravts@openvz.org&gt;:<br/>
 +
[BRIDGE] Unaligned access on IA64 when compare ether addr
 +
 
 +
Patch fixes unaligned access that takes place on ia64 in compare_ether_addr()
 +
compare_ether_addr() requires address to be aligned on 2-byte boundary,
 +
while addresses declared in bridges are aligned on 1-byte.
 +
 
 +
Bug #79001.<br/>
 +
Bug #78983.
 +
</div>
 +
 
 +
==== diff-ms-fatal-signal-20070413 ====
 +
<div class="change">
 +
Patch from Denis Lunev &lt;den@openvz.org&gt;:<br/>
 +
[PATCH] Fatal signal processing logic
 +
 
 +
This patch changes fatal signal processing logic.
 +
The SIGKILL should be raised only for all threads in the groups *EXCEPT* recipient one.
 +
 
 +
{{Bug|533}}.
 +
</div>
 +
 
 +
==== diff-ms-stopmachine-msleep-20070411 ====
 +
<div class="change">
 +
Patch from Konstantin Khorenko &lt;khorenko@openvz.org&gt;:<br/>
 +
problem found by Vasily (vvs@) &amp; Kirill (dev@):
 +
 
 +
a possible situation in stop_machine:
 +
 
 +
* stopmachine_state == STOPMACHINE_WAIT;
 +
* STOPPER (stop_machine()) is in state SM_STOPPER_WAITING, calling yield() in a loop;
 +
* SLAVES (stopmachine()) also call yield() in a loop.
 +
 
 +
This leads to the fairsched_lock suffering on all CPUs and in case of unfair
 +
getting lock rules (for example on NUMA node), some CPUs can wait for the lock
 +
forever/for a long time, causing a hang of the node.
 +
This patch replaces yield() by msleep(10).
 +
 
 +
Mainstream kernel is affected as well, though it is harder to trigger:
 +
One CPU does yield() taking/releasing rq-&gt;lock,
 +
while another CPU should try to take that rq-&gt;lock (e.g. for balancing)
 +
and will livelock forever.
 +
 
 +
Bug #78975.
 +
</div>
 +
 
 +
==== diff-rh-dlm-misc-device-fix-20070412 ====
 +
<div class="change">
 +
Patch from Andrey Mirkin &lt;major@openvz.org&gt;, backported from -nmw git tree:
 +
 
 +
<pre>
 +
From: Patrick Caulfield &lt;pcaulfie@redhat.com&gt;
 +
Date: Wed, 21 Mar 2007 09:23:53 +0000 (+0000)
 +
Subject: [DLM] Don't delete misc device if lockspace removal fails
 +
X-Git-Url: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Fsteve%2Fgfs2-2.6-nmw.git;a=commitdiff_plain;h=2ebea8b4cc0b8859a99d2006ec085c1da2d0758a;hp=ead2e9aa2a555bb1d94ff2e5b4c974d0e7cc6518
 +
 
 +
[DLM] Don't delete misc device if lockspace removal fails
 +
 
 +
Currently if the lockspace removal fails the misc device associated with a
 +
lockspace is left deleted. After that there is no way to access the orphaned
 +
lockspace from userland.
 +
 
 +
This patch recreates the misc device if th dlm_release_lockspace fails. I
 +
believe this is better than attempting to remove the lockspace first because
 +
that leaves an unattached device lying around. The potential gap in which there
 +
is no access to the lockspace between removing the misc device and recreating it
 +
is acceptable ... after all the application is trying to remove it, and only new
 +
users of the lockspace will be affected.
 +
 
 +
Signed-Off-By: Patrick Caulfield &lt;pcaulfie@redhat.com&gt;
 +
Signed-off-by: Steven Whitehouse &lt;swhiteho@redhat.com&gt;
 +
</pre>
 +
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=232878
 +
 
 +
Bug #79107.
 +
</div>
 +
 
 +
==== diff-ubc-dcache-uncharge-root-20070413 ====
 +
<div class="change">
 +
Patch from Alexey Dobriyan &lt;adobriyan@openvz.org&gt;:<br/>
 +
[BC] uncharge fs root (/) from dcachesize
 +
 
 +
"/" dentry was charged in d_alloc_root(), then charged and uncharged
 +
during filesystem activity. But at umount time that first charge was
 +
forgotten.
 +
 
 +
So uncharge "/" by hand.
 +
 
 +
Bug #77771.
 +
</div>
 +
 
 +
==== diff-ubc-ioacct-context-handle-20070413 ====
 +
<div class="change">
 +
 
 +
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
 +
[IOACCT] Fix ioacct race
 +
 
 +
When page becomes dirty there's no time to store a context
 +
on it - page may become clean immediately.
 +
 
 +
Thus we had a race in accounting when a page became clean
 +
before we set a context on it and this context got lost and
 +
not freed.
 +
 
 +
Handle the context the other way - in case we're going to
 +
set a new context on a page that already has one - free it
 +
and account written bytes in case the page became clean.
 +
When removing a context from a page - handle the case when
 +
a page does not have one due to the race in question. In
 +
any case dirty page will have a context set, and a clean
 +
one will have not.
 +
 +
Bug #79008.
 +
</div>
 +
 
 +
==== diff-ubc-ioprio-clean-active-ub-20070411 ====
 +
<div class="change">
 +
Patch from Vasily Tarasov &lt;vtaras@openvz.org&gt;:<br/>
 +
[IOPRIO] cleaning active beancounter
 +
 
 +
After beancounter disappears, it still can be active.
 +
Clean it up.
 +
 
 +
Bug #78875.
 +
</div>
 +
 
 +
==== diff-ubc-percpu-sign-expansion-20070410 ====
 +
<div class="change">
 +
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
 +
[BC] Percpu counters discrepancy on 64bit arches
 +
 
 +
Operation
 +
long += -(unsigned int);
 +
leads to wrong result on 64bit due to no sign extension.
 +
 
 +
Bug #78998.
 +
</div>
 +
 
 +
==== diff-ve-futex-vpid-fix-20070413 ====
 +
 
 +
<div class="change">
 +
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
 +
[PATCH] pthead_mutex_lock deadlock inside VE
 +
 
 +
This patch replaces diff-ve-futex-EDEADLK-bypass-20061225
 +
and previous small patch reparing user-level deadlock in pthread_mutex_lock()
 +
happening because value 0 was not "virtual".
 +
 
 +
It undoes unnatural tests for virtuality of pid supplied by user.
 +
Naively, this can result in kernel warning if user mangles
 +
pid doing something like:
 +
<source lang="c"> 
 +
syscall(__NR_futex, &amp;l, FUTEX_LOCK_PI, 0, 0);
 +
l = getpid() | 1024;
 +
syscall(__NR_futex, &amp;l, FUTEX_LOCK_PI, 0, 0);
 +
</source>
 +
 
 +
This happens due to pid aliasing introduced with vpids:
 +
test pid == virt_pid(current) is not enough to ensure that
 +
pid does not correspond to current.
 +
 
 +
I find more natural to test for this condition directly.
 +
</div>
 +
 
 +
==== diff-ve-kconfig-security-deps-20070411 ====
 +
<div class="change">
 +
Patch from Vasily Tarasov &lt;vtaras@openvz.org&gt;:<br/>
 +
[PATCH] kconfig: security depends on !ve
 +
 
 +
Many people have CONFIG_SECURITY enabled in their configs.
 +
When they try to do `make oldconfig` for OpenVZ kernels with such
 +
configs, no questions appear concerning CONFIG_VE and friends, and
 +
people have OpenVZ kernels with virtualization features disabled.
 +
Fix it. Reverse the dependency of VE/SECURITY.
 +
</div>
 +
 
 +
==== diff-ve-venet-stoprace-20070412 ====
 +
<div class="change">
 +
Patch from Denis Lunev &lt;den@openvz.org&gt;:<br/>
 +
[VENET] stop IP management before freeing venet
 +
 
 +
The device is freed before the VE&lt;-&gt;IP mapping is cleaned.
 +
 
 +
Bug #75502.
 +
</div>
 +
 
 +
==== diff-vzdq-sleep-under-inode-lock-20070412 ====
 +
 
 +
<div class="change">
 +
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:
 +
 
 +
The calltrace:
 +
<source lang="c">
 +
vzdq_aquotq_lookup
 +
iget5_locked()
 +
get_new_inode()
 +
  `- spin_lock(&amp;inode_lock);
 +
find_inode()
 +
->set() /* == vzdq_aquotq_lookset */
 +
vdq_aquot_lookset()
 +
user_get_super()
 +
  `- down_read(...)
 +
</source>
 +
 
 +
So it may sleep with inode_lock taken.
 +
Move all the sleeping operations out of the lock.
 +
 
 +
Bug #79124.
 +
</div>
 +
 
 +
==== diff-i2o-timeout-20071013 ====
 +
<div class="change">
 +
Patch from Kostja:
 +
 
 +
Adds missed error check while trying to get message slot in I2OPASSTHRU ioctl
 +
handler. This error caused kernel crash if one uses "raideng" (from raidutils)
 +
when controller timeouts.
 +
 
 +
Bug #79279.
 +
 
 +
</div>
 +
 
 +
==== diff-ubc-io-release-debug-20070416 ====
 +
<div class="change">
 +
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
 +
[IOACCT] Debug on page release
 +
 
 +
When releasing an IO beancounter from the page that
 +
is not supposed to IO pb print a warning.
 +
 
 +
might help debug bug #79427.
 +
</div>
 +
 
 +
==== diff-ubc-put-beancounters-in-fork-20070416 ====
 +
<div class="change">
 +
Patch from Alexey Dobriyan &lt;adobriyan@openvz.org&gt;:<br/>
 +
[BC] UB put on error path in fork()
 +
 
 +
If fork() fails after ub_task_charge(),
 +
nobody is putting three beancounters getted there.
 +
 
 +
Bug #77231.
 +
</div>
 +
 
 +
==== diff-arch-x86-ioremap-guard-page-fix-20070419 ====
 +
<div class="change">
 +
Patch from Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;:<br/>
 +
[4GB] change_page_attr() BUG's on strange pages
 +
 
 +
It is really long story...
 +
 
 +
this patch restore iounmap code as it was after:
 +
<pre>
 +
commit 2c692eefe4aff109eab9384b6d8c7e1a8f094dad
 +
Author: ak &lt;ak&gt;
 +
Date:  Sun Jan 23 18:29:11 2005 +0000
 +
 
 +
later this code was removed by this patch:
 +
commit a7dd5a5f2b5db975bdf1dcaa3f3da6c289630076
 +
Author: andrea &lt;andrea&gt;
 +
 
 +
Date:  Tue Mar 8 18:49:39 2005 +0000
 +
 
 +
this code was broken again after:
 +
commit bf5421c309bb89e5106452bc840983b1b4754d61
 +
Author: Andi Kleen &lt;ak@suse.de&gt;
 +
Date:  Mon Dec 12 22:17:09 2005 -0800
 +
</pre>
 +
 
 +
The problem is actually that get_vm_area() allocates
 +
one more page then was requested for guard page.
 +
But change_page_attr() doesn't actually takes this into account.
 +
 
 +
Bug #79617.
 +
</div>
 +
 
 +
==== diff-cpt-active-callback-eagain-20070423 ====
 +
<div class="change">
 +
Patch from Andrey Mirkin &lt;major@openvz.org&gt;:<br/>
 +
[CPT] retry checkpointing if VE has active netlink callback
 +
 
 +
Return -EAGAIN instead of -EBUSY if netlink socket has active callback.
 +
In this case we will try to freeze VE 3 times.
 +
</div>
 +
 
 +
==== diff-cpt-make-zombie-b-20070420 ====
 +
 
 +
<div class="change">
 +
Patch from Thorsten Schifferdecker:<br/>
 +
[CPT] tux is missing on vanilla kernels
 +
 
 +
Compilation bug was introduced by
 +
[http://git.openvz.org/?p=linux-2.6.18-openvz;a=commitdiff;h=f90c2c318467829ecde43919ad38f326527f533b http://git.openvz.org/?p=linux-2.6.18-openvz;a=commitdiff;h=f90c2c318467829ecde43919ad38f326527f533b]
 +
 
 +
Fixes {{bug|545}}.
 +
</div>
 +
 
 +
==== diff-cpt-restore-netlink-socket-20070419 ====
 +
<div class="change">
 +
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
 +
[CPT] restore rcv queue on netlink sockets and unbound netlink sockets
 +
 
 +
Code restoring queues was forgotten. This fixes bug #79723.
 +
 
 +
Unbound sockets were restored incorrectly, they were outbound
 +
to some post, which prevented subsequent bind by application.
 +
This fixes bug #79724.
 +
 +
The patch overrides previous patch with subj:
 +
"[CPT] restore rcv queue on netlink sockets", which fixed only bug #79723.
 +
</div>
 +
 
 +
==== diff-cpt-setup-pagein-fix-20070418 ====
 +
<div class="change">
 +
Patch from Andrey Mirkin &lt;major@openvz.org&gt;:<br/>
 +
[CPT] Fix lazy migration
 +
 
 +
Accidentally during porting checkpointing on 2.6.18 kernel rst_setup_pagein()
 +
function was added to code twice. This breaks lazy migration.
 +
 
 +
Bug #77921.
 +
 
 +
Creation procedure of pgin block device was quite hairy, so almost every time
 +
during lazy migration we can see annoying message in kernel log:
 +
register_blkdev: cannot get major 254 for pgin
 +
 
 +
In this patch creation of pgin block device is fixed.
 +
</div>
 +
 
 +
==== diff-cpt-ubc-off-20070424 ====
 +
<div class="change">
 +
 
 +
Patch from Alexandr Andreev &lt;aandreev@openvz.org&gt;:<br/>
 +
[BC] compilation fix with CONFIG_USER_RESOURCE=n
 +
 
 +
Compilation fix with CONFIG_USER_RESOURCE=n
 +
</div>
 +
 
 +
==== diff-cpt-zombie-threads-20070416 ====
 +
<div class="change">
 +
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
 +
[CPT] thread groups with exited leader did not migrate
 +
 
 +
The bug is simple and stupid, it is very strange nobody saw this.
 +
 
 +
When thread group leader exits, its mm/files/fs/namespace are released
 +
but zombie process remain frozen until all the threads exit.
 +
Restore was not able to restore such configuration.
 +
 
 +
Solution is simple: when chackpointing save not real (NULL)
 +
mm/files/fs/namespace, but mm/files/fs/namespace of this thread group.
 +
</div>
 +
 
 +
==== diff-fairsched-best-vcpu-b-20070423 ====
 +
 
 +
<div class="change">
 +
Patch from Alexandr Andreev &lt;aandreev@openvz.org&gt;:<br/>
 +
[SCHED] Select some vcpu instead of idle even if all vcpus are hot
 +
 
 +
We have to use oldest vcpu if all vcpu's are hot.
 +
In current kernel an idle_vcpu is used and CPU can idle instead of
 +
doing some job.
 +
 
 +
Bug #79676.
 +
 
 +
</div>
 +
==== diff-fairsched-unused-rq-20070424 ====
 +
 
 +
<div class="change">
 +
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
 +
[SCHED] cleanup: removed unused variable
 +
 
 +
struct rq *rq was used before fairsched patch to compare
 +
tasks with rq-&gt;idle. With fairsched idle task is bound to
 +
pcpu, not vcpu and thus struct rq *rq is simply not needed.
 +
</div>
 +
 
 +
==== diff-ms-cfq-allow-merge-b-20070424 ====
 +
 
 +
<div class="change">
 +
Patch from Vasily Tarasov &lt;vtaras@openvz.org&gt;:<br/>
 +
[PATCH] modification of allow merge policy in cfq (mainstream)
 +
 
 +
Jens Axboe rewrited allow merge policy one more time after
 +
we reported the problem and fixed the problem we face curently when some
 +
tasks experience I/O starvations.
 +
 
 +
This is an incremental patch to the previous patch
 +
diff-ms-cfq-allow-merge-20070117 ported to OpenVZ.
 +
 
 +
This patch is cumulative of
 +
 
 +
<ul>
 +
* [http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=719d34027e1a186e46a3952e8a24bf91ecc33837 719d34027e1a186e46a3952e8a24bf91ecc33837]
 +
 
 +
* [http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ec8acb6904fabb8e741f741ec99bb1c18f2b3dee ec8acb6904fabb8e741f741ec99bb1c18f2b3dee]
 +
</ul>
 +
 
 +
Bug #79594.
 +
 
 +
</div>
 +
==== diff-ms-cpufreq-centrino-20070426 ====
 +
<div class="change">
 +
Patch from Alexey Dobriyan &lt;adobriyan@openvz.org&gt;:<br/>
 +
 
 +
[PATCH] Make speedstep centrino cpufreq driver use wr/rdmsr_on_cpu()
 +
 
 +
speedstep-centrino cpufreq driver was using set_cpus_allowed() and
 +
checks for smp_processor_id() to confine itself to given CPU.
 +
 
 +
Switch to rdmsr_on_cpu/wrmsr_on_cpu() infrastructure.
 +
 
 +
Closes http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=420708
 +
</div>
 +
 
 +
==== diff-ms-elf-retval-20070420 ====
 +
<div class="change">
 +
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
 +
[PATCH] Invalid return value of execve() resulting in oopses (mainstream)
 +
 
 +
When elf loader fails to map executable (due to memory shortage
 +
or because binary is malformed), it can return 0. Normally, this is invisible
 +
because process is killed with SIGKILL and it never returns to user space.
 +
 
 +
But if exec() is called from kernel thread (hotplug, whatever) consequences
 +
are more interesting and vary depending on architecture.
 +
 
 +
i386. Nothing especially interesting, execve() just returns with "success"   :-)
 +
 
 +
x86_64. Fake zero frame is used on way to caller, RSP/RIP are loaded
 +
with zeros, ergo... double fault.
 +
 
 +
ia64. Similar to i386, but r32...r95 are corrupted. Sometimes it oopses
 +
due to return to zero PC, sometimes it sees NaT in rXX and oopses
 +
due to NaT consumption.
 +
 
 +
This fix solves bugs #68582 (i386), #73753 (x86_64) and #79847 (ia64).
 +
</div>
 +
 
 +
==== diff-ms-fib-netlink-lookup-recursion-20070425 ====
 +
<div class="change">
 +
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
 +
[PATCH] stack overflow in netlink (mainstream)
 +
 
 +
Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel,
 +
which resulted in infinite recursion and stack overflow.
 +
 
 +
The bug is present in all kernel versions since the feature appeared.
 +
 
 +
(linux 2.6.13, Jun 20th, 2005,<br/>
 +
commit 246955fe4c38bd706ae30e37c64892c94213775d,<br/>
 +
[NETLINK]: fib_lookup() via netlink)
 +
 
 +
The patch also makes some minimal cleanup:
 +
# Return something consistent (-ENOENT) when fib table is missing
 +
# Do not crash when queue is empty (does not happen, but yet)
 +
# Put result of lookup
 +
 
 +
Franlky, I would delete this thing instead of fixing. It looks ugly
 +
and was used only for debugging LC-trie.
 +
 
 +
Signed-off-by: Alexey Kuznetsov &lt;kuznet@ms2.inr.ac.ru&gt;<br/>
 +
Acked-by: Dave Miller &lt;davem@davemloft.net&gt;
 +
</div>
 +
 
 +
==== diff-ms-fib-netlink-lookup-recursion-b-20070425 ====
 +
 
 +
<div class="change">
 +
Patch from Sergey Vlasov &lt;vsu@altlinux.ru&gt;:<br/>
 +
[NETLINK] Fix for Alexey's netlink lookup recursion fix
 +
 
 +
When CONFIG_IP_MULTIPLE_TABLES is enabled, the code in nl_fib_lookup()
 +
needs to initialize the res.r field before fib_res_put(&amp;res) - unlike
 +
fib_lookup(), a direct call to -&gt;tb_lookup does not set this field.
 +
</div>
 +
 
 +
==== diff-ms-loop-dont-complete-lo-bh-done-20070418 ====
 +
 
 +
<div class="change">
 +
Patch from Alexey Dobriyan &lt;adobriyan@openvz.org&gt;:<br/>
 +
[PATCH] loopback: oops on loopback mount/umount (mainstream)
 +
 
 +
After LOOP_SET_FD/LOOP_CLR_FD combo loop device's queue gets request
 +
handler which is persistent.
 +
 
 +
After, say
 +
mount -t iso9660 /dev/loop0 /mnt # sic
 +
this request handler is called directly with<br/>
 +
a) -&gt;lo_state being Lo_unbound<br/>
 +
b) -&gt;lo_pending being zero
 +
 
 +
Error path in loop_make_request() completes -&gt;lo_bh_done completion
 +
which is persistent as well.
 +
 
 +
Now, let's start worker thread as usual. It'll set -&gt;lo_pending to 1,
 +
don't wait for completion because it was already completed (brokenly),
 +
and will not get out of infinite loop because of -&gt;lo_pending. Loop
 +
device doesn't have bios at this point and triggers BUG_ON.
 +
 
 +
So, don't complete -&gt;lo_bh_done when loop device isn't setup fully.
 +
In mainstream it was accidentaly fixed when converting to kthreads.
 +
 
 +
Bug #79521.
 +
</div>
 +
 
 +
==== diff-ubc-ioprio-force-disp-off-20070424 ====
 +
 
 +
<div class="change">
 +
Patch from Vasily Tarasov &lt;vtaras@openvz.org&gt;:<br/>
 +
[IOPRIO] forced dispatching when CONFIG_UBC_IO_PRIO off
 +
 
 +
If CONFIG_UBC_IO_PRIO is off, then no beancounters are in active list,
 +
consequently we have a bug in forced dispatching case.
 +
 
 +
{{Bug|528}}.
 +
</div>
 +
 
 +
==== diff-ubc-ioprio-oops-on-virt-off-20070423 ====
 +
<div class="change">
 +
Patch from Vasily Tarasov &lt;vtaras@openvz.org&gt;:<br/>
 +
[IOPRIO] Oops on IO-prioritization disabling
 +
 
 +
If io-prioritization is suddenly turned off via
 +
/sys/block/&lt;dev&gt;/queue/iosched/virt_mode cfqq owner BC does not equal
 +
current io context. It is right to get beancounter obtained from queue,
 +
but not from current IO context.
 +
</div>
 +
 
 +
==== diff-ubc-off-20070424 ====
 +
<div class="change">
 +
Patch from Alexandr Andreev &lt;aandreev@openvz.org&gt;:<br/>
 +
[BC] compilation fix with CONFIG_USER_RESOURCE=n
 +
 
 +
Compilation fix with CONFIG_USER_RESOURCE=n
 +
</div>
 +
 
 +
==== diff-ubc-unusedprivvm-in-zeromap-20070424 ====
 +
<div class="change">
 +
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
 +
[BC] Leak of privvmpages on zero page maps
 +
 
 +
This was obviously forgotten.
 +
Each mmap of /dev/zero causes this leak.
 +
 
 +
Bug #80246.
 +
</div>
 +
 
 +
==== diff-ve-meminfo-b-20070330 ====
 +
<div class="change">
 +
Patch from Konstantin Khorenko &lt;khorenko@openvz.org&gt;:<br/>
 +
[MEMINFO] sysctl for selecting UsedMem source
 +
 
 +
Adds sysctl to choose base ubc parameter for memory usage inside a VE.
 +
Sets PRIVVMPAGES beancounter to be used by default instead of OOMGUARPAGES.
 +
 
 +
Bug #78088.
 +
</div>
 +
 
 +
==== diff-ve-netlink-veprintk-20070426 ====
 +
<div class="change">
 +
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
 +
[NETLINK] VE netlink message should go into VE log
 +
 
 +
When parsing netlink arguments kernel may printk that
 +
some bytes left unparsed. Make this info appear in VE log,
 +
instead of global one.
 +
</div>
 +
 
 +
==== diff-ve-tun-persist-20070424 ====
 +
<div class="change">
 +
Patch from Vasily Averin &lt;vvs@openvz.org&gt;:<br/>
 +
[TUN] prohibit tun persistent mode inside VE
 +
 
 +
Prohibit tun persistent mode inside VE, until resolved via ve hooks.
 +
 
 +
Bug #79612.
 +
</div>
 +
 
 +
==== diff-ve-xen-netback-20070426 ====
 +
<div class="change">
 +
Patch from Sergey Korshunoff &lt;seyko2@&gt;:
 +
 
 +
Fix Xen netback driver, since loopback_dev is defined as a macro
 +
in OVZ and is substituted.
 +
</div>
 +
 
 +
==== diff-vzdq-ugbad-20070428 ====
 +
<div class="change">
 +
Patch from Konstantin Khorenko &lt;khorenko@openvz.org&gt;:<br/>
 +
[VZDQ] prohibit chown of a file if owner doesn't have ugid struct
 +
 
 +
Prohibit chown a file if its owner does not have
 +
ugid record. This might happen if we somehow exceeded
 +
the UID/GID (e.g. set ugidlimit less than number of users).
 +
 
 +
Bug #79553.
 +
</div>
 +
 
 +
==== diff-vzstat-numa-fixes-b-20070416 ====
 +
<div class="change">
 +
Patch from Alexandr Andreev &lt;aandreev@openvz.org&gt;:<br/>
 +
[VZSTAT] Sum up per node stats for pgdat's
 +
 
 +
cat /proc/vz/stats in 2.6.18 shows information about zones
 +
for each node, but vzstat utility parse information about one (last)
 +
node only, and shows incorrect memory info on hosts with several nodes
 +
(numa hosts for instance). New code scans all nodes and summarize
 +
statistics for identical zones per each node.
 +
 
 +
Bug #77994.
 +
</div>
 +
 
 +
==== diff-vzstat-zone-id-20070416 ====
 +
<div class="change">
 +
Patch from Alexandr Andreev &lt;aandreev@openvz.org&gt;:<br/>
 +
 
 +
[VZSTAT] Fix vzstat when DMA32 zone on x8664 (index 1) is empty
 +
 
 +
Fix vzstat when DMA32 zone on x8664 (index 1) is empty.
 +
For this, show ordinal number of a zone instead of it's real
 +
index in kernel 'zones' array.
 +
So the output will look like:
 +
<pre>
 +
0 DMA
 +
skipped DMA32 zone
 +
1 Normal
 +
</pre>
 +
 
 +
Bug #77336.
 +
</div>
 +
 
 +
==== io-accounting-menuconfig.patch ====
 +
<div class="change">
 +
Patch from Alexey Dobriyan &lt;adobriyan@openvz.org&gt;:
 +
 
 +
1. Move TASK_IO_ACCOUNTING out of EMBEDDED menu cf placement in mainline.
 +
 
 +
2. As side effect, EMBEDDED menu will be shown on same level as designed,
 +
not returning to top level after SYSCTL option.
 +
 
 +
{{Bug|550}}.
 +
</div>
 +
 
 +
==== namespaces-utsname-xen.patch ====
 +
<div class="change">
 +
Patch from Sergey Korshunoff &lt;seyko2@&gt;:
 +
 
 +
Fix utsname handling in process-xen.c, which is a copy of process.c
 +
</div>
 +
 
 +
==== diff-i2o-msgleak-10070423 ====
 +
<div class="change">
 +
Patch from Vasiliy:
 +
 
 +
This patch fixes i2o message leak.
 +
We need to free msg itself and i2o message in hw in case
 +
of error.
 +
</div>
 +
 
 +
==== diff-i2o-msgget-errh-10070423 ====
 +
<div class="change">
 +
Patch from Vasiliy:
 +
 
 +
This patch fixes access to memory that has not been allocated:
 +
i2o_msg_get_wait() can returns errors different from I2O_QUEUE_EMPTY.
 +
But the result is checked only against this code.
 +
if it is not I2O_QUEUE_EMPTY then we dereference the error code as the pointer later.
 +
</div>
 +
 
 +
==== diff-i2o-cfg-passthru-20070423 ====
 +
<div class="change">
 +
Patch from Vasiliy:
 +
 
 +
This patch fixes a number of issues in i2o_cfg_passthru{,32}:
 +
 
 +
* memory leaks (including i2o_message leak fixed by khorenko@sw.ru)
 +
* infinite loop to sg_list_cleanup in passthru32
 +
* bad error paths
 +
</div>
 +
 
 +
==== diff-i2o-proc-perms-20060304 ====
 +
<div class="change">
 +
Patch from Vasiliy:
 +
 
 +
Reading from some i2o related proc files can lead to the controller hang due
 +
unknown reasons. As a workaround this patch changes the permission of these
 +
files to root-only accessible.
 +
</div>
 +
 
 +
==== linux-2.6.18-drbd-8.0.0-8.0.2.patch ====
 +
<div class="change">
 +
Drbd update to v8.0.2
 +
</div>
 +
 
 +
</noinclude>

Latest revision as of 18:29, 22 October 2009

Contents

Changes

  • Mainstream security fix in netlink.
  • Fixes in checkpointing, CPU fair scheduler, VE I/O scheduling, beancounters, VE disk quotas, etc.

Config changes

Added:

  • +CONFIG_IKCONFIG=y
  • +CONFIG_IKCONFIG_PROC=y
  • +CONFIG_KEXEC=y (i686-PAE only)
  • +CONFIG_CRASH_DUMP=y (i686-PAE only)
  • +CONFIG_PROC_VMCORE=y (i686-PAE only)

Removed:

  • -CONFIG_4KSTACKS=y

Patches

diff-arch-4gb-20070409

Patch from Kirill Korotaev <dev@openvz.org>:
[4G/4G] i686: 4GB split patch

4G split patch based on linux-2.6.0-4g4g.patch from RHEL4 from Ingo Molnar.

Changes:

  • fixed reserved pages handling
  • fixed PI futexes direct user access atomic operations
  • fixed lockdep
  • fixed entry.S to handle multiple irets and 16bit stacks, NMI support
  • reworked hw breakpoints support as in my patch in 2.4.x
  • fixed huge TSS and io bitmaps
  • added PTL locks support
  • fixed kprobes
  • 4-level page tables
  • lots of fixes merged from OVZ 2.6.9 (see below)

Merged:

linux-2.6.9-4g4g-hugemem-warning.patch
linux-2.6.9-4g4g-noncachable.patch
diff-arch-4gb-swsuspd
Fixed broken software suspend
diff-arch-4gb-fixaddr2
Fixed VSYSCALL_BASE/FIXADDR_TOP interrelations (8k stacks)
diff-arch-4gb-stksize
Fixed virtual stack mappings for 8k stacks
diff-arch-4gb-copyusr
Optimization in filemap.c on copy_XXX_user
diff-arch-4gb-vmalloc2
vmalloc area extending up to 256Mb
diff-arch-4gb-tasksize
Limiting user address space to 3GB
diff-arch-4gb-tssmaps
Fixed TSS mapings for online_cpus > 2 and big TSS
diff-arch-4gb-ldtleak
Fixed leak of LDT pages on error path
diff-arch-4gb-gcc296
Fixed get_user() compilation bug on GCC 2.96
diff-arch-4gb-amd-prefetch
Fixed prefetch code detection on AMD
diff-arch-4gb-mce-20060824
Fixed MCE handling when 4GB split is on

Not needed (already in):

  • linux-2.6.9-net-b44-4g4g.patch
  • linux-2.6.9-4g4g-maxtasksize.patch
  • diff-arch-4gb-pgdctor
  • diff-arch-4gb-getname

diff-ms-lockdep-neighbour-table-class-20070409

Patch from Pavel Emelianov <xemul@openvz.org>:

[LOCKDEP] Fix wrong deadlock report in neigh table

Lockdep detects a fake deadlock in the calltrace:

neigh_proxy_process()
  `- lock(neigh_table->proxy_queue.lock);
arp_process (tbl->proxy_redo)
neigh_event_ns
neigh_update
skb_purge_queue
  `- lock(neighbour->arp_queue.lock);

Actually there is no deadlock as the first lock and the second one are different skb_buff_head's locks, but they are initialized both in skb_queue_head_init() and thus have on lockdep-class.

This is a mainstream "BUG". Fixed by adding another class to neigh_table's proxy_queue lock.

Bug #78837.

(Tested with node bootup, VE start and vzt-prep vzt-ss).

diff-ubc-page-uncharge-bug-first-20070406

Patch from Pavel Emelianov <xemul@openvz.org>:
[BC] Check correct user_beancounter passed first in ub_page_uncharge()

If page accidentally has a not-removed page_beancounter kernel will oops dereferencing ub->ub_percpu(). Move the BUG_ON upper to be sure we work with user_beancounter.

This might be the lost in bug #78461.

diff-ubc-refcount-leak-dupmm-20070409

Patch from Alexey Dobriyan <adobriyan@openvz.org>:
[BC] refcount leak in dup_mm() on error path

Fix simple beancounter refcount leak on error path in dup_mm().

Bug #77231.

diff-ubc-vmguar-enough-null-mm-20070406

Patch from Vasily Tarasov <vtaras@openvz.org>:
[BC] vmguar_enough_memory() oopses if called form kernel thread

If vmguar_enough_memory() function is called by kernel thread, it oopses due to task_struct->mm equals NULL. Such situation was encountered when aufs was over ramfs.

diff-cpt-make-zombie-20070413

Patch from Alexey Kuznetsov <alexey@openvz.org>:
[CPT] alternative way to migrate zombie processes

In older 2.6.8 kernels do_exit() was very simple, essentially it disposed m etc, which is done automatically while checkpointing, and did some work on notifying parent. So that it was natural to move restored process to zombie state by hands.

In 2.6.18 do_exit makes _lots_ of work.

Seems, it is easier to invert logic. We introduce new flag PF_RESTART_EXIT, which suppresses the work which was already done, when process at source hardware node moved to zombie state (mostly, sending signals) and use do_exit() when restoring zombie processes.

Also, the same patch adds checks for a few of new things, which cannot be migrated, it is related because the list of those things obtained from do_exit().

diff-cpt-off-lockdep-on-sockets-20070413

Patch from Pavel Emelianov <xemul@openvz.org>:
[CPT] Fix lockdep warning on socket dump

CPT locks all the sockets it finds for dumping. This is OK, but lockdep thinks as if it were a circular locking.

It happens each time we migrate a VE with more than one socked aboard.

Bug #79164.

diff-cpt-robust-list-20070413

Patch from Alexey Kuznetsov <alexey@openvz.org>:
[CPT] checkpointing robust lists

Otherwise we are going to have problems with migration of newer glibcs using robust lists when this is possible.

diff-fairsched-best-vcpu-20070413

Patch from Alexandr Andreev <aandreev@openvz.org>:
[SCHED] Reduce starvation of some VCPUs in case of cpu limits

Change logic of choosing best_vcpu to schedule to. There are two potential problems:

a) if a vcpu is hot, and last used physical CPU of this vcpu is equal to smp_processor_id() it will be always chosen. This is not a good decision, because there is no guarantee, that _all_ physical CPU's must take vcpu's from a vsched. For example, if cpulimit for a vsched is small, this vsched can be run only on one physical CPU forever.

b) Also now newer 'cold' vcpu's are chosen first, because we scan active_list in direct way, i.e. from older vcpus to newer vcpus, and a newer one will be chosen finally. In this case old vcpu's can starve for a long time

Bug #79015.

diff-fairsched-find-idle-vcpu-20070403

Patch from Alexandr Andreev <aandreev@openvz.org>:
[SCHED] find_idle_vcpu() mask check fix

In find_idle_vcpu() we skip VCPU's with ID's that is not set in physical '*cpus' mask. It's incorrect. We must skip VCPU's that has appropriate VCPU->last_pcpu

diff-ms-bridge-unaligned-access-20070411

Patch from Evgeny Kravtsunov <emkravts@openvz.org>:
[BRIDGE] Unaligned access on IA64 when compare ether addr

Patch fixes unaligned access that takes place on ia64 in compare_ether_addr() compare_ether_addr() requires address to be aligned on 2-byte boundary, while addresses declared in bridges are aligned on 1-byte.

Bug #79001.
Bug #78983.

diff-ms-fatal-signal-20070413

Patch from Denis Lunev <den@openvz.org>:
[PATCH] Fatal signal processing logic

This patch changes fatal signal processing logic. The SIGKILL should be raised only for all threads in the groups *EXCEPT* recipient one.

OpenVZ Bug #533.

diff-ms-stopmachine-msleep-20070411

Patch from Konstantin Khorenko <khorenko@openvz.org>:
problem found by Vasily (vvs@) & Kirill (dev@):

a possible situation in stop_machine:

  • stopmachine_state == STOPMACHINE_WAIT;
  • STOPPER (stop_machine()) is in state SM_STOPPER_WAITING, calling yield() in a loop;
  • SLAVES (stopmachine()) also call yield() in a loop.

This leads to the fairsched_lock suffering on all CPUs and in case of unfair getting lock rules (for example on NUMA node), some CPUs can wait for the lock forever/for a long time, causing a hang of the node. This patch replaces yield() by msleep(10).

Mainstream kernel is affected as well, though it is harder to trigger: One CPU does yield() taking/releasing rq->lock, while another CPU should try to take that rq->lock (e.g. for balancing) and will livelock forever.

Bug #78975.

diff-rh-dlm-misc-device-fix-20070412

Patch from Andrey Mirkin <major@openvz.org>, backported from -nmw git tree:

From: Patrick Caulfield <pcaulfie@redhat.com>
Date: Wed, 21 Mar 2007 09:23:53 +0000 (+0000)
Subject: [DLM] Don't delete misc device if lockspace removal fails
X-Git-Url: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Fsteve%2Fgfs2-2.6-nmw.git;a=commitdiff_plain;h=2ebea8b4cc0b8859a99d2006ec085c1da2d0758a;hp=ead2e9aa2a555bb1d94ff2e5b4c974d0e7cc6518

[DLM] Don't delete misc device if lockspace removal fails

Currently if the lockspace removal fails the misc device associated with a
lockspace is left deleted. After that there is no way to access the orphaned
lockspace from userland.

This patch recreates the misc device if th dlm_release_lockspace fails. I
believe this is better than attempting to remove the lockspace first because
that leaves an unattached device lying around. The potential gap in which there
is no access to the lockspace between removing the misc device and recreating it
is acceptable ... after all the application is trying to remove it, and only new
users of the lockspace will be affected.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=232878

Bug #79107.

diff-ubc-dcache-uncharge-root-20070413

Patch from Alexey Dobriyan <adobriyan@openvz.org>:
[BC] uncharge fs root (/) from dcachesize

"/" dentry was charged in d_alloc_root(), then charged and uncharged during filesystem activity. But at umount time that first charge was forgotten.

So uncharge "/" by hand.

Bug #77771.

diff-ubc-ioacct-context-handle-20070413

Patch from Pavel Emelianov <xemul@openvz.org>:
[IOACCT] Fix ioacct race

When page becomes dirty there's no time to store a context on it - page may become clean immediately.

Thus we had a race in accounting when a page became clean before we set a context on it and this context got lost and not freed.

Handle the context the other way - in case we're going to set a new context on a page that already has one - free it and account written bytes in case the page became clean. When removing a context from a page - handle the case when a page does not have one due to the race in question. In any case dirty page will have a context set, and a clean one will have not.

Bug #79008.

diff-ubc-ioprio-clean-active-ub-20070411

Patch from Vasily Tarasov <vtaras@openvz.org>:
[IOPRIO] cleaning active beancounter

After beancounter disappears, it still can be active. Clean it up.

Bug #78875.

diff-ubc-percpu-sign-expansion-20070410

Patch from Pavel Emelianov <xemul@openvz.org>:
[BC] Percpu counters discrepancy on 64bit arches

Operation

long += -(unsigned int);

leads to wrong result on 64bit due to no sign extension.

Bug #78998.

diff-ve-futex-vpid-fix-20070413

Patch from Alexey Kuznetsov <alexey@openvz.org>:
[PATCH] pthead_mutex_lock deadlock inside VE

This patch replaces diff-ve-futex-EDEADLK-bypass-20061225 and previous small patch reparing user-level deadlock in pthread_mutex_lock() happening because value 0 was not "virtual".

It undoes unnatural tests for virtuality of pid supplied by user. Naively, this can result in kernel warning if user mangles pid doing something like:

  
syscall(__NR_futex, &amp;l, FUTEX_LOCK_PI, 0, 0);
l = getpid() | 1024;
syscall(__NR_futex, &amp;l, FUTEX_LOCK_PI, 0, 0);

This happens due to pid aliasing introduced with vpids: test pid == virt_pid(current) is not enough to ensure that pid does not correspond to current.

I find more natural to test for this condition directly.

diff-ve-kconfig-security-deps-20070411

Patch from Vasily Tarasov <vtaras@openvz.org>:
[PATCH] kconfig: security depends on !ve

Many people have CONFIG_SECURITY enabled in their configs. When they try to do `make oldconfig` for OpenVZ kernels with such configs, no questions appear concerning CONFIG_VE and friends, and people have OpenVZ kernels with virtualization features disabled. Fix it. Reverse the dependency of VE/SECURITY.

diff-ve-venet-stoprace-20070412

Patch from Denis Lunev <den@openvz.org>:
[VENET] stop IP management before freeing venet

The device is freed before the VE<->IP mapping is cleaned.

Bug #75502.

diff-vzdq-sleep-under-inode-lock-20070412

Patch from Pavel Emelianov <xemul@openvz.org>:

The calltrace:

vzdq_aquotq_lookup
iget5_locked()
get_new_inode()
   `- spin_lock(&amp;inode_lock);
find_inode()
 ->set() /* == vzdq_aquotq_lookset */
vdq_aquot_lookset()
user_get_super()
   `- down_read(...)

So it may sleep with inode_lock taken. Move all the sleeping operations out of the lock.

Bug #79124.

diff-i2o-timeout-20071013

Patch from Kostja:

Adds missed error check while trying to get message slot in I2OPASSTHRU ioctl handler. This error caused kernel crash if one uses "raideng" (from raidutils) when controller timeouts.

Bug #79279.

diff-ubc-io-release-debug-20070416

Patch from Pavel Emelianov <xemul@openvz.org>:
[IOACCT] Debug on page release

When releasing an IO beancounter from the page that is not supposed to IO pb print a warning.

might help debug bug #79427.

diff-ubc-put-beancounters-in-fork-20070416

Patch from Alexey Dobriyan <adobriyan@openvz.org>:
[BC] UB put on error path in fork()

If fork() fails after ub_task_charge(), nobody is putting three beancounters getted there.

Bug #77231.

diff-arch-x86-ioremap-guard-page-fix-20070419

Patch from Dmitry Monakhov <dmonakhov@openvz.org>:
[4GB] change_page_attr() BUG's on strange pages

It is really long story...

this patch restore iounmap code as it was after:

commit 2c692eefe4aff109eab9384b6d8c7e1a8f094dad
Author: ak <ak>
Date:   Sun Jan 23 18:29:11 2005 +0000

later this code was removed by this patch:
commit a7dd5a5f2b5db975bdf1dcaa3f3da6c289630076
Author: andrea <andrea>

Date:   Tue Mar 8 18:49:39 2005 +0000

this code was broken again after:
commit bf5421c309bb89e5106452bc840983b1b4754d61
Author: Andi Kleen <ak@suse.de>
Date:   Mon Dec 12 22:17:09 2005 -0800

The problem is actually that get_vm_area() allocates one more page then was requested for guard page. But change_page_attr() doesn't actually takes this into account.

Bug #79617.

diff-cpt-active-callback-eagain-20070423

Patch from Andrey Mirkin <major@openvz.org>:
[CPT] retry checkpointing if VE has active netlink callback

Return -EAGAIN instead of -EBUSY if netlink socket has active callback. In this case we will try to freeze VE 3 times.

diff-cpt-make-zombie-b-20070420

Patch from Thorsten Schifferdecker:
[CPT] tux is missing on vanilla kernels

Compilation bug was introduced by http://git.openvz.org/?p=linux-2.6.18-openvz;a=commitdiff;h=f90c2c318467829ecde43919ad38f326527f533b

Fixes OpenVZ Bug #545.

diff-cpt-restore-netlink-socket-20070419

Patch from Alexey Kuznetsov <alexey@openvz.org>:
[CPT] restore rcv queue on netlink sockets and unbound netlink sockets

Code restoring queues was forgotten. This fixes bug #79723.

Unbound sockets were restored incorrectly, they were outbound to some post, which prevented subsequent bind by application. This fixes bug #79724.

The patch overrides previous patch with subj: "[CPT] restore rcv queue on netlink sockets", which fixed only bug #79723.

diff-cpt-setup-pagein-fix-20070418

Patch from Andrey Mirkin <major@openvz.org>:
[CPT] Fix lazy migration

Accidentally during porting checkpointing on 2.6.18 kernel rst_setup_pagein() function was added to code twice. This breaks lazy migration.

Bug #77921.

Creation procedure of pgin block device was quite hairy, so almost every time during lazy migration we can see annoying message in kernel log: register_blkdev: cannot get major 254 for pgin

In this patch creation of pgin block device is fixed.

diff-cpt-ubc-off-20070424

Patch from Alexandr Andreev <aandreev@openvz.org>:
[BC] compilation fix with CONFIG_USER_RESOURCE=n

Compilation fix with CONFIG_USER_RESOURCE=n

diff-cpt-zombie-threads-20070416

Patch from Alexey Kuznetsov <alexey@openvz.org>:
[CPT] thread groups with exited leader did not migrate

The bug is simple and stupid, it is very strange nobody saw this.

When thread group leader exits, its mm/files/fs/namespace are released but zombie process remain frozen until all the threads exit. Restore was not able to restore such configuration.

Solution is simple: when chackpointing save not real (NULL) mm/files/fs/namespace, but mm/files/fs/namespace of this thread group.

diff-fairsched-best-vcpu-b-20070423

Patch from Alexandr Andreev <aandreev@openvz.org>:
[SCHED] Select some vcpu instead of idle even if all vcpus are hot

We have to use oldest vcpu if all vcpu's are hot. In current kernel an idle_vcpu is used and CPU can idle instead of doing some job.

Bug #79676.

diff-fairsched-unused-rq-20070424

Patch from Pavel Emelianov <xemul@openvz.org>:
[SCHED] cleanup: removed unused variable

struct rq *rq was used before fairsched patch to compare tasks with rq->idle. With fairsched idle task is bound to pcpu, not vcpu and thus struct rq *rq is simply not needed.

diff-ms-cfq-allow-merge-b-20070424

Patch from Vasily Tarasov <vtaras@openvz.org>:
[PATCH] modification of allow merge policy in cfq (mainstream)

Jens Axboe rewrited allow merge policy one more time after we reported the problem and fixed the problem we face curently when some tasks experience I/O starvations.

This is an incremental patch to the previous patch diff-ms-cfq-allow-merge-20070117 ported to OpenVZ.

This patch is cumulative of

Bug #79594.

diff-ms-cpufreq-centrino-20070426

Patch from Alexey Dobriyan <adobriyan@openvz.org>:

[PATCH] Make speedstep centrino cpufreq driver use wr/rdmsr_on_cpu()

speedstep-centrino cpufreq driver was using set_cpus_allowed() and checks for smp_processor_id() to confine itself to given CPU.

Switch to rdmsr_on_cpu/wrmsr_on_cpu() infrastructure.

Closes http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=420708

diff-ms-elf-retval-20070420

Patch from Alexey Kuznetsov <alexey@openvz.org>:
[PATCH] Invalid return value of execve() resulting in oopses (mainstream)

When elf loader fails to map executable (due to memory shortage or because binary is malformed), it can return 0. Normally, this is invisible because process is killed with SIGKILL and it never returns to user space.

But if exec() is called from kernel thread (hotplug, whatever) consequences are more interesting and vary depending on architecture.

i386. Nothing especially interesting, execve() just returns with "success"  :-)

x86_64. Fake zero frame is used on way to caller, RSP/RIP are loaded with zeros, ergo... double fault.

ia64. Similar to i386, but r32...r95 are corrupted. Sometimes it oopses due to return to zero PC, sometimes it sees NaT in rXX and oopses due to NaT consumption.

This fix solves bugs #68582 (i386), #73753 (x86_64) and #79847 (ia64).

diff-ms-fib-netlink-lookup-recursion-20070425

Patch from Alexey Kuznetsov <alexey@openvz.org>:
[PATCH] stack overflow in netlink (mainstream)

Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel, which resulted in infinite recursion and stack overflow.

The bug is present in all kernel versions since the feature appeared.

(linux 2.6.13, Jun 20th, 2005,
commit 246955fe4c38bd706ae30e37c64892c94213775d,
[NETLINK]: fib_lookup() via netlink)

The patch also makes some minimal cleanup:

  1. Return something consistent (-ENOENT) when fib table is missing
  2. Do not crash when queue is empty (does not happen, but yet)
  3. Put result of lookup

Franlky, I would delete this thing instead of fixing. It looks ugly and was used only for debugging LC-trie.

Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Acked-by: Dave Miller <davem@davemloft.net>

diff-ms-fib-netlink-lookup-recursion-b-20070425

Patch from Sergey Vlasov <vsu@altlinux.ru>:
[NETLINK] Fix for Alexey's netlink lookup recursion fix

When CONFIG_IP_MULTIPLE_TABLES is enabled, the code in nl_fib_lookup() needs to initialize the res.r field before fib_res_put(&res) - unlike fib_lookup(), a direct call to ->tb_lookup does not set this field.

diff-ms-loop-dont-complete-lo-bh-done-20070418

Patch from Alexey Dobriyan <adobriyan@openvz.org>:
[PATCH] loopback: oops on loopback mount/umount (mainstream)

After LOOP_SET_FD/LOOP_CLR_FD combo loop device's queue gets request handler which is persistent.

After, say

mount -t iso9660 /dev/loop0 /mnt	# sic

this request handler is called directly with
a) ->lo_state being Lo_unbound
b) ->lo_pending being zero

Error path in loop_make_request() completes ->lo_bh_done completion which is persistent as well.

Now, let's start worker thread as usual. It'll set ->lo_pending to 1, don't wait for completion because it was already completed (brokenly), and will not get out of infinite loop because of ->lo_pending. Loop device doesn't have bios at this point and triggers BUG_ON.

So, don't complete ->lo_bh_done when loop device isn't setup fully. In mainstream it was accidentaly fixed when converting to kthreads.

Bug #79521.

diff-ubc-ioprio-force-disp-off-20070424

Patch from Vasily Tarasov <vtaras@openvz.org>:
[IOPRIO] forced dispatching when CONFIG_UBC_IO_PRIO off

If CONFIG_UBC_IO_PRIO is off, then no beancounters are in active list, consequently we have a bug in forced dispatching case.

OpenVZ Bug #528.

diff-ubc-ioprio-oops-on-virt-off-20070423

Patch from Vasily Tarasov <vtaras@openvz.org>:
[IOPRIO] Oops on IO-prioritization disabling

If io-prioritization is suddenly turned off via /sys/block/<dev>/queue/iosched/virt_mode cfqq owner BC does not equal current io context. It is right to get beancounter obtained from queue, but not from current IO context.

diff-ubc-off-20070424

Patch from Alexandr Andreev <aandreev@openvz.org>:
[BC] compilation fix with CONFIG_USER_RESOURCE=n

Compilation fix with CONFIG_USER_RESOURCE=n

diff-ubc-unusedprivvm-in-zeromap-20070424

Patch from Pavel Emelianov <xemul@openvz.org>:
[BC] Leak of privvmpages on zero page maps

This was obviously forgotten. Each mmap of /dev/zero causes this leak.

Bug #80246.

diff-ve-meminfo-b-20070330

Patch from Konstantin Khorenko <khorenko@openvz.org>:
[MEMINFO] sysctl for selecting UsedMem source

Adds sysctl to choose base ubc parameter for memory usage inside a VE. Sets PRIVVMPAGES beancounter to be used by default instead of OOMGUARPAGES.

Bug #78088.

diff-ve-netlink-veprintk-20070426

Patch from Pavel Emelianov <xemul@openvz.org>:
[NETLINK] VE netlink message should go into VE log

When parsing netlink arguments kernel may printk that some bytes left unparsed. Make this info appear in VE log, instead of global one.

diff-ve-tun-persist-20070424

Patch from Vasily Averin <vvs@openvz.org>:
[TUN] prohibit tun persistent mode inside VE

Prohibit tun persistent mode inside VE, until resolved via ve hooks.

Bug #79612.

diff-ve-xen-netback-20070426

Patch from Sergey Korshunoff <seyko2@>:

Fix Xen netback driver, since loopback_dev is defined as a macro in OVZ and is substituted.

diff-vzdq-ugbad-20070428

Patch from Konstantin Khorenko <khorenko@openvz.org>:
[VZDQ] prohibit chown of a file if owner doesn't have ugid struct

Prohibit chown a file if its owner does not have ugid record. This might happen if we somehow exceeded the UID/GID (e.g. set ugidlimit less than number of users).

Bug #79553.

diff-vzstat-numa-fixes-b-20070416

Patch from Alexandr Andreev <aandreev@openvz.org>:
[VZSTAT] Sum up per node stats for pgdat's

cat /proc/vz/stats in 2.6.18 shows information about zones for each node, but vzstat utility parse information about one (last) node only, and shows incorrect memory info on hosts with several nodes (numa hosts for instance). New code scans all nodes and summarize statistics for identical zones per each node.

Bug #77994.

diff-vzstat-zone-id-20070416

Patch from Alexandr Andreev <aandreev@openvz.org>:

[VZSTAT] Fix vzstat when DMA32 zone on x8664 (index 1) is empty

Fix vzstat when DMA32 zone on x8664 (index 1) is empty. For this, show ordinal number of a zone instead of it's real index in kernel 'zones' array. So the output will look like:

0 DMA
skipped DMA32 zone
1 Normal

Bug #77336.

io-accounting-menuconfig.patch

Patch from Alexey Dobriyan <adobriyan@openvz.org>:

1. Move TASK_IO_ACCOUNTING out of EMBEDDED menu cf placement in mainline.

2. As side effect, EMBEDDED menu will be shown on same level as designed, not returning to top level after SYSCTL option.

OpenVZ Bug #550.

namespaces-utsname-xen.patch

Patch from Sergey Korshunoff <seyko2@>:

Fix utsname handling in process-xen.c, which is a copy of process.c

diff-i2o-msgleak-10070423

Patch from Vasiliy:

This patch fixes i2o message leak. We need to free msg itself and i2o message in hw in case of error.

diff-i2o-msgget-errh-10070423

Patch from Vasiliy:

This patch fixes access to memory that has not been allocated: i2o_msg_get_wait() can returns errors different from I2O_QUEUE_EMPTY. But the result is checked only against this code. if it is not I2O_QUEUE_EMPTY then we dereference the error code as the pointer later.

diff-i2o-cfg-passthru-20070423

Patch from Vasiliy:

This patch fixes a number of issues in i2o_cfg_passthru{,32}:

  • memory leaks (including i2o_message leak fixed by khorenko@sw.ru)
  • infinite loop to sg_list_cleanup in passthru32
  • bad error paths

diff-i2o-proc-perms-20060304

Patch from Vasiliy:

Reading from some i2o related proc files can lead to the controller hang due unknown reasons. As a workaround this patch changes the permission of these files to root-only accessible.

linux-2.6.18-drbd-8.0.0-8.0.2.patch

Drbd update to v8.0.2