Open main menu

OpenVZ Virtuozzo Containers Wiki β

Changes

Download/kernel/rhel4/023stab040.1/changes

15,062 bytes added, 15:03, 20 March 2008
created
== Changes ==
* Mincore security fix ({{CVE|2006-4814}})
* iptables compat mode fixes
* More than 512 IPs fix
* Modules unloading fixes
* Memory leak in kmemsize fixed
* 2 minor mainstream fixes
* Fixed tty restore on CPT
* Removed vmrss accounting
* Updated DRBD to 0.7.22.

=== Configs ===
Same as {{Kernel link|rhel4|023stab037.3}}, plus:
* +<code>CONFIG_JFS_FS=m</code>
* +<code>CONFIG_JFS_POSIX_ACL=y</code>
* +<code>CONFIG_XFS_FS=m</code>
* +<code>CONFIG_XFS_QUOTA=y</code>
* +<code>CONFIG_XFS_POSIX_ACL=y</code>
* +<code>CONFIG_LOCK_HARNESS=m</code>
* +<code>CONFIG_GFS_FS=m</code>
* +<code>CONFIG_LOCK_NOLOCK=m</code>
* +<code>CONFIG_LOCK_DLM=m</code>
* +<code>CONFIG_LOCK_GULM=m</code>
* +<code>CONFIG_CLUSTER=m</code>
* +<code>CONFIG_CLUSTER_DLM=m</code>
* +<code>CONFIG_CLUSTER_DLM_PROCLOCKS=y</code>

==== diff-ve-fibhashfree-20061221 ====
<div class="change">
Patch from Vasily:
fib_hash_free() called from fini_ve_route() uses wrong size argument,
and leads to oops in kfree() when too many IP were assigned to VE.
Bug #73426.
</div>

==== diff-*proc-owner-20061218 ====
diff-ve-proc-owner-20061218,<br/>
diff-cpt-proc-owner-20061218,<br/>
diff-vzdq-proc-owner-20061218
<div class="change">
Patches from Evgeny (ekravtsunov@sw.ru), modified by Kirill:

create proc entries from module is dangerous thing.
de->owner should be set atomically, though no one in
mainstream does so. To set owner atomically we can protect
against the race with proc_lookup() using lock_kernel().

Bug #73019.
</div>

==== diff-sysrqkey-scancode-20061121 ====
<div class="change">
Patch from Alexandr Andreev:

This patch lets you to change the SysRq key in Alt+SysRq+XXX combination
with any other key:

<pre># echo NEW_SCANCODE &gt; /proc/sys/kernel/sysrq-key</pre>

You can get scancodes of your keyboard with programs like showkey or
evtest. The default Alt+SysRq combination still works after redifinition.
</div>

==== diff-jbd-unexpectdirty-20060905 ====
<div class="change">
Patch from linux mainstream, prepared by Vasily:

[http://linux.bkbits.net:8080/linux-2.6/gnupatch@431f7f0ceyo6g8tikQvG3I-cCSb7kw http://linux.bkbits.net:8080/linux-2.6/gnupatch@431f7f0ceyo6g8tikQvG3I-cCSb7kw]

"attached patch should fix the following race:
<pre>
Proc 1 Proc 2

__flush_batch()
ll_rw_block()
do_get_write_access()
lock_buffer
jh is only waiting for checkpoint
-&gt; b_transaction == NULL -&gt;
do nothing
unlock_buffer
test_set_buffer_locked()
test_clear_buffer_dirty()
__journal_file_buffer()
change the data
submit_bh()
</pre>

and we have sent wrong data to disk... We now clean the dirty buffer flag
under buffer lock in all cases and hence we know that whenever a buffer is
starting to be journaled we either finish the pending write-out before
attaching a buffer to a transaction or we won't write the buffer until the
transaction is going to be committed.

The test in jbd_unexpected_dirty_buffer() is redundant - remove it.
Furthermore we have to clear the buffer dirty bit under the buffer lock to
prevent races with buffer write-out (and hence prevent returning a buffer
with IO happening).

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;<br/>
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;<br/>
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;"

Bug #68106.
</div>

==== diff-ve-vpid-leak ====
<div class="change">
Patch from Alexey:
[PATCH] leakage of vpid_mapping

Probably this fixes bug #62834.

The problem was that when switching to sparse VPID mappings, we could
have processes with non-virtual pids entered to VE. F.e. it could be
some stuck process from VE setup scripts. In this case we created
useless mapping struct, which was nevere freed, because it referred
to non-virtual pid.

I left a printk() in the code, because we definitely need confirmation
that this event really happens. It does not in my tests: to the moment
I run 400000 checkpoint/restores and 20000 of migrations on VE and I found
no problems, unfortunately.

dev@: somehow was not ported from 2.6.8-022stab078.x branch
</div>

==== diff-fairsched-assert-20060602 ====
<div class="change">
Patch by Andrey (saw@):
This patch fixes assertions in fairsched to avoid printk deadlocks,
and to print more information.
</div>

==== diff-ve-root-user-20060605 ====
<div class="change">
Patch from Vasily:
in some places we should compare not with &amp;root_user ptr (HN root),
but with VE root.
Resulted in inability of su to change user when ulimit
was too tight for root.

dev@: somehow was not ported from 022stab078.x branch...
</div>

==== diff-ms-mincore-locking-20061218 ====
<div class="change">
Patch from mainstream:
[SECURITY]: Deadlock in mincore (CVE-2006-4814)

Marcel Holtman reported that sys_mincore() implementation has incorrect
locking: copy_to_user() shouldn't be done under mmap_sem.

The whole security thread resulted in Linus idea to rewrite
the code due to its being crap, but still the following patches
were commited for the beginning instead of the rewrite patch:

[http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2f77d107050abc14bc393b34bdb7b91cf670c250 GIT: 2f77d107050abc14bc393b34bdb7b91cf670c250]
<br/>[http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4fb23e439ce09157d64b89a21061b9fc08f2b495 GIT: 4fb23e439ce09157d64b89a21061b9fc08f2b495]
<br/>[http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=825020c3866e7312947e17a0caa9dd1a5622bafc GIT: 825020c3866e7312947e17a0caa9dd1a5622bafc]

Bug #73299.
</div>

==== diff-ms-nf-ipt-compat-offsets-20061218 ====
<div class="change">
Patch from Dmitry:
compat offsets should be 'unsigned int' as entries array size has
this dimension.

Bug #73201.
</div>

==== diff-fairsched-starttime-20061214 ====
<div class="change">
Patch from Alexandr Andreev:<br/>
<ul>
<li>Update vcpu-&gt;start_time if we decided to stay on the previous vcpu
This improves performance a little, if the schedule() is called too
often.</li>
<li>Set type of start_time to unsigned long to let it be in one scale with
jiffies.</li>
</ul>
</div>

==== diff-ms-stopmachine-up-20060827 ====
<div class="change">
Patch from mainstream:
[PATCH] Remove redundant up() in stop_machine()

An up() is called in kernel/stop_machine.c on failure, and also in the
caller (unconditionally).

Signed-off-by: Zhou Yingchao &lt;yingchao.zhou@gmail.com&gt;<br/>
Cc: &lt;stable@kernel.org&gt;<br/>
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;<br/>
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;<br/>

[http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4edb9a143e31d2e191c199262226e1a5923ff8f7 GIT: 4edb9a143e31d2e191c199262226e1a5923ff8f7]<br/>
[http://linux.bkbits.net:8080/linux-2.6/gnupatch@44f1de0eQHQmbszw5F8_Z8enqg1ihw http://linux.bkbits.net:8080/linux-2.6/gnupatch@44f1de0eQHQmbszw5F8_Z8enqg1ihw]
</div>

==== diff-ubc-ia64-ptecharge-20061215 ====
<div class="change">
Patch from Alexey:

Alexey found that IA64 doesn't charge PTEs to kmemsize.
</div>

==== diff-ve-venet-stop-20061215 ====
<div class="change">
Patch from Denis:
<ul>
<li>module reference counting fixed: initialization of ve-&gt;veip
in veip_start() should take module ref counter</li>
<li>unregister_netdev(venet) moved to venet_stop(),
otherwise race: vznet can be unloaded before vecalls
succeed to unregister all net devices</li>
<li>veip_cleanup() cleanup venet structures on unloading</li>
</ul>

Bug #72973
</div>

==== diff-dput-preempt ====
<div class="change">
Patch from Alexey:
[PATCH] dput() exists with disabled preemption
Both 2.6.9 and 2.6.18, and even 2.6.8.
</div>

==== diff-cpt-check-external-mounts ====
<div class="change">
Patch from Andrey:

We can't restore external bind mounts, so we should check if they already
exist (mounted by vzctl mount scripts) on restore process.
</div>

==== diff-cpt-add-extfs-20061208 ====
<div class="change">
Patch from Vasiliy:
ext2/ext3 filesystems are not recognized by CPT now, consequently
bind mount migration fails. This patch adds these filesystems.
</div>

==== diff-ve-cmdline-quiet-20061208 ====
<div class="change">
Patch from Alexandr Andreev:
Add "quiet" to /proc/cmdline inside VE.
Bug #54370.
</div>

==== diff-fs-symlink-err-fix ====
<div class="change">
Patch from Dmitry (dmonakhov@):

page_symlink() ignore commit_write() ret value.
page_symlink() check only prepare_write() ret value, but ignore ret value
from commit_write(). This is not good because commit_write() may fail too,
especially in case of any delayed allocations (ext3-pgfault patches).

Bug #72993.

BTW recent kernels check commit_write() ret value since 2.6.17-rc1:
<br/>[http://lkml.org/lkml/2006/3/12/178 http://lkml.org/lkml/2006/3/12/178]
</div>

==== diff-cpt-debug-printk-20061213 ====
<div class="change">
Patch from Andrey:<br/>
Some messages in CPT code should be printed only when debug is turned on.

Bug #73174.
</div>

==== diff-simfs-free-blkdev-20061127 ====
<div class="change">
Patch from Kirill:
<ul>
<li>fix simfs bdev setting</li>
<li>beautify code a bit</li>
</ul>

Bug #72938.
</div>

==== diff-ve-venet-stop-b-20061227 ====
<div class="change">
Patch from Denis:
This patch fixes memory leak introduced by diff-venet-vestop-20061213

Bug #73679.
</div>

==== diff-ms-linger-timeout ====
<div class="change">
Patch from mainstream:
[NET]: Make sure l_linger is unsigned to avoid negative timeouts

One of my x86_64 (linux 2.6.13) server log is filled with :

<pre>
schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca
schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca
schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca
schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca
schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca
</pre>

This is because some application does a

<pre>
struct linger li;
li.l_onoff = 1;
li.l_linger = -1;
setsockopt(sock, SOL_SOCKET, SO_LINGER, &amp;li, sizeof(li));
</pre>

And unfortunately l_linger is defined as a 'signed int' in
include/linux/socket.h:

<pre>
struct linger {
int l_onoff; /* Linger active */
int l_linger; /* How long to linger for */
};
</pre>

I don't know if it's safe to change l_linger to 'unsigned int' in the
include file (It might be defined as int in ABI specs)

Signed-off-by: Eric Dumazet &lt;dada1@cosmosbay.com&gt;<br/>
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;

[http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=9261c9b042547d01eeb206cf0e21ce72832245ec GIT: 9261c9b042547d01eeb206cf0e21ce72832245ec]<br/>
[http://linux.bkbits.net:8080/linux-2.6/cset@1.3332.271.3 http://linux.bkbits.net:8080/linux-2.6/cset@1.3332.271.3]

Bug #73688.
</div>

==== diff-ms-aio-nrpages-20061228 ====
<div class="change">
Patch from mainstream, prepared by Kostja:

This patch fixes the crash caused by incorrect initialization of "nr_pages" in aio.
We should not claim to have filled in the ring_pages[] array until we actually
_do_ fill it in. It will confuse the code that frees the structure if we claim
there are pages there that don't exist.

[http://linux.bkbits.net:8080/linux-2.6/gnupatch@418e67e3jfC3msWLXzcdTkI10dwtEg http://linux.bkbits.net:8080/linux-2.6/gnupatch@418e67e3jfC3msWLXzcdTkI10dwtEg]

Bug #73878.
</div>

==== diff-ms-emt64-dblfault-debug-20070115 ====

<div class="change">
Patch from Denis:<br/>

This patch adds thread_info debug for double fault handler on x8664.
</div>

==== diff-cpt-dentryopen-error-path ====
<div class="change">
Patch from Alexey:<br/>
[CPT] oops in an error path

When we were not able to reopen file for read, ERR_PTR() was
used as alive file pointer.

Damn, it _happened_. And I cannot recollect how and why.
</div>

==== diff-cpt-tty-restore-20070115 ====
<div class="change">
Patch from Andrey (amirkin@), backported from 2.6.16 patch:

TTY_LDISC flag was omitted on restore leading to messages:
"kernel: init_dev but no ldisc"
After that process hangs in D state. In some cases this can leads to node
crash.

Bug #74039.
</div>

==== diff-ubc-vmrss-remove ====
<div class="change">
Patch from Pavel:

Per-vma RSS accounting is (was) needed for debugging
of privvmpages accounting only, but it produces more
headache that help.

Privvmpages leak MUCH rarely and these cases can be
debugged without this accounting so just remove this
at all.
</div>

==== diff-ve-contextrestore-20070109 ====
<div class="change">
Patch from Denis:

Context on the task can be corrupted on memory allocation failure
</div>

==== diff-ubc-contextrestore-20070109 ====
<div class="change">
Patch from Denis:

Context on the task can be corrupted on allocation failure.
Possible fix for bug #74031.
</div>

==== diff-ve-neigh-tbl-init-20070109 ====
<div class="change">
Patch from Vasily:
fixes memory and ub leaks in neigh_table_init()
corrected version, prevent access to unitialized
hash_buckets and phash_buckets.

Bug #74067.
</div>

==== diff-ms-nf-compat-redir-20070111 ====
<div class="change">
Patch from Dmitry:
added compat function to ipt_REDIRECT in order to get it working
in 32bit VEs over 64bit node.

Bug #74179.
</div>

==== diff-ve-nr-dead-atomic-20070108 ====
<div class="change">
Patch from Pavel (xemul@), ported by Kostja:

This patch eliminates the selfdeadlock in __put_task_struct
caused by changing the nr_dead under tasklist_lock.

Bug #74029.

original patch was diff-ve-nr-dead-atomic-20060310 from 2.6.18,
its description:

<pre>
# Fixed Kirill's (dev@) comment not tu use obfuscated macros.
# ----------------------------
# revision 1.1
# date: 2006/03/10 16:42:04; author: xemul; state: Exp;
# Do not take task_list_lock in put_task_struct do change nr_dead counter.
# Otherwise - deadlock:
# Mar 10 18:58:18 ts13 [&lt;c0238b44&gt;] __write_lock_debug+0xc4/0xf0
# Mar 10 18:58:18 ts13 [&lt;c0238bcf&gt;] _raw_write_lock+0x5f/0xa0
# Mar 10 18:58:18 ts13 [&lt;c03da43c&gt;] _write_lock_irq+0xc/0x10
# Mar 10 18:58:18 ts13 [&lt;c011ee8d&gt;] __put_task_struct+0x9d/0x180
# Mar 10 18:58:18 ts13 [&lt;c011fefd&gt;] sighand_free_cb+0x1d/0x30
# Mar 10 18:58:18 ts13 [&lt;c013772c&gt;] rcu_do_batch+0x2c/0x70
# Mar 10 18:58:18 ts13 [&lt;c0137994&gt;] rcu_process_callbacks+0x34/0x60
# Mar 10 18:58:18 ts13 [&lt;c0129156&gt;] tasklet_action+0x66/0xd0
# Mar 10 18:58:18 ts13 [&lt;c0128da2&gt;] __do_softirq+0xa2/0x130
# Mar 10 18:58:18 ts13 [&lt;c0105aaf&gt;] do_softirq+0x4f/0x60
# Putting of task structs is performed via rcu in 2.6.16 and sometimes
# tasklist_lock is taken w/o _irq.
#
# Replaces diff-ms-tasklistlock-irq-20060310
</pre>
</div>

==== diff-ms-net-indev-init-20070105 ====
<div class="change">
Patch from Denis:
This patch corrects inet device initialization order to avoid partly
initialized device.

Bug #73995.
</div>

==== linux-2.6.9-drbd-0.7.20-0.7.22.patch ====
<div class="change">
Updated DRBD to version 0.7.22
</div>