  • Mincore security fix (CVE-2006-4814)
  • iptables compat mode fixes
  • More than 512 IPs fix
  • Modules unloading fixes
  • Memory leak in kmemsize fixed
  • 2 minor mainstream fixes
  • Fixed tty restore on CPT
  • Removed vmrss accounting
  • Updated DRBD to 0.7.22.


Same as 023stab037.3, plus:



Patch from Vasily: fib_hash_free() called from fini_ve_route() uses wrong size argument, and leads to oops in kfree() when too many IP were assigned to VE. Bug #73426.



Patches from Evgeny (, modified by Kirill:

create proc entries from module is dangerous thing. de->owner should be set atomically, though no one in mainstream does so. To set owner atomically we can protect against the race with proc_lookup() using lock_kernel().

Bug #73019.


Patch from Alexandr Andreev:

This patch lets you to change the SysRq key in Alt+SysRq+XXX combination with any other key:

# echo NEW_SCANCODE > /proc/sys/kernel/sysrq-key

You can get scancodes of your keyboard with programs like showkey or evtest. The default Alt+SysRq combination still works after redifinition.


Patch from linux mainstream, prepared by Vasily:

"attached patch should fix the following race:

  Proc 1                               Proc 2

					  jh is only waiting for checkpoint
				     -> b_transaction == NULL ->
					     do nothing
					change the data

and we have sent wrong data to disk... We now clean the dirty buffer flag under buffer lock in all cases and hence we know that whenever a buffer is starting to be journaled we either finish the pending write-out before attaching a buffer to a transaction or we won't write the buffer until the transaction is going to be committed.

The test in jbd_unexpected_dirty_buffer() is redundant - remove it. Furthermore we have to clear the buffer dirty bit under the buffer lock to prevent races with buffer write-out (and hence prevent returning a buffer with IO happening).

Signed-off-by: Jan Kara <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>"

Bug #68106.


Patch from Alexey: [PATCH] leakage of vpid_mapping

Probably this fixes bug #62834.

The problem was that when switching to sparse VPID mappings, we could have processes with non-virtual pids entered to VE. F.e. it could be some stuck process from VE setup scripts. In this case we created useless mapping struct, which was nevere freed, because it referred to non-virtual pid.

I left a printk() in the code, because we definitely need confirmation that this event really happens. It does not in my tests: to the moment I run 400000 checkpoint/restores and 20000 of migrations on VE and I found no problems, unfortunately.

dev@: somehow was not ported from 2.6.8-022stab078.x branch


Patch by Andrey (saw@): This patch fixes assertions in fairsched to avoid printk deadlocks, and to print more information.


Patch from Vasily: in some places we should compare not with &root_user ptr (HN root), but with VE root. Resulted in inability of su to change user when ulimit was too tight for root.

dev@: somehow was not ported from 022stab078.x branch...


Patch from mainstream: [SECURITY]: Deadlock in mincore (CVE-2006-4814)

Marcel Holtman reported that sys_mincore() implementation has incorrect locking: copy_to_user() shouldn't be done under mmap_sem.

The whole security thread resulted in Linus idea to rewrite the code due to its being crap, but still the following patches were commited for the beginning instead of the rewrite patch:

GIT: 2f77d107050abc14bc393b34bdb7b91cf670c250
GIT: 4fb23e439ce09157d64b89a21061b9fc08f2b495
GIT: 825020c3866e7312947e17a0caa9dd1a5622bafc

Bug #73299.


Patch from Dmitry: compat offsets should be 'unsigned int' as entries array size has this dimension.

Bug #73201.


Patch from Alexandr Andreev:

  • Update vcpu->start_time if we decided to stay on the previous vcpu This improves performance a little, if the schedule() is called too often.
  • Set type of start_time to unsigned long to let it be in one scale with jiffies.


Patch from mainstream: [PATCH] Remove redundant up() in stop_machine()

An up() is called in kernel/stop_machine.c on failure, and also in the caller (unconditionally).

Signed-off-by: Zhou Yingchao <>
Cc: <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>

GIT: 4edb9a143e31d2e191c199262226e1a5923ff8f7


Patch from Alexey:

Alexey found that IA64 doesn't charge PTEs to kmemsize.


Patch from Denis:

  • module reference counting fixed: initialization of ve->veip in veip_start() should take module ref counter
  • unregister_netdev(venet) moved to venet_stop(), otherwise race: vznet can be unloaded before vecalls succeed to unregister all net devices
  • veip_cleanup() cleanup venet structures on unloading

Bug #72973


Patch from Alexey: [PATCH] dput() exists with disabled preemption Both 2.6.9 and 2.6.18, and even 2.6.8.


Patch from Andrey:

We can't restore external bind mounts, so we should check if they already exist (mounted by vzctl mount scripts) on restore process.


Patch from Vasiliy: ext2/ext3 filesystems are not recognized by CPT now, consequently bind mount migration fails. This patch adds these filesystems.


Patch from Alexandr Andreev: Add "quiet" to /proc/cmdline inside VE. Bug #54370.


Patch from Dmitry (dmonakhov@):

page_symlink() ignore commit_write() ret value. page_symlink() check only prepare_write() ret value, but ignore ret value from commit_write(). This is not good because commit_write() may fail too, especially in case of any delayed allocations (ext3-pgfault patches).

Bug #72993.

BTW recent kernels check commit_write() ret value since 2.6.17-rc1:


Patch from Andrey:
Some messages in CPT code should be printed only when debug is turned on.

Bug #73174.


Patch from Kirill:

  • fix simfs bdev setting
  • beautify code a bit

Bug #72938.


Patch from Denis: This patch fixes memory leak introduced by diff-venet-vestop-20061213

Bug #73679.


Patch from mainstream: [NET]: Make sure l_linger is unsigned to avoid negative timeouts

One of my x86_64 (linux 2.6.13) server log is filled with :

schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca
schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca
schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca
schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca
schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca

This is because some application does a

struct linger li;
li.l_onoff = 1;
li.l_linger = -1;
setsockopt(sock, SOL_SOCKET, SO_LINGER, &li, sizeof(li));

And unfortunately l_linger is defined as a 'signed int' in include/linux/socket.h:

struct linger {
	int l_onoff;        /* Linger active                */
	int l_linger;       /* How long to linger for       */

I don't know if it's safe to change l_linger to 'unsigned int' in the include file (It might be defined as int in ABI specs)

Signed-off-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>

GIT: 9261c9b042547d01eeb206cf0e21ce72832245ec

Bug #73688.


Patch from mainstream, prepared by Kostja:

This patch fixes the crash caused by incorrect initialization of "nr_pages" in aio. We should not claim to have filled in the ring_pages[] array until we actually _do_ fill it in. It will confuse the code that frees the structure if we claim there are pages there that don't exist.

Bug #73878.


Patch from Denis:

This patch adds thread_info debug for double fault handler on x8664.


Patch from Alexey:
[CPT] oops in an error path

When we were not able to reopen file for read, ERR_PTR() was used as alive file pointer.

Damn, it _happened_. And I cannot recollect how and why.


Patch from Andrey (amirkin@), backported from 2.6.16 patch:

TTY_LDISC flag was omitted on restore leading to messages: "kernel: init_dev but no ldisc" After that process hangs in D state. In some cases this can leads to node crash.

Bug #74039.


Patch from Pavel:

Per-vma RSS accounting is (was) needed for debugging of privvmpages accounting only, but it produces more headache that help.

Privvmpages leak MUCH rarely and these cases can be debugged without this accounting so just remove this at all.


Patch from Denis:

Context on the task can be corrupted on memory allocation failure


Patch from Denis:

Context on the task can be corrupted on allocation failure. Possible fix for bug #74031.


Patch from Vasily: fixes memory and ub leaks in neigh_table_init() corrected version, prevent access to unitialized hash_buckets and phash_buckets.

Bug #74067.


Patch from Dmitry: added compat function to ipt_REDIRECT in order to get it working in 32bit VEs over 64bit node.

Bug #74179.


Patch from Pavel (xemul@), ported by Kostja:

This patch eliminates the selfdeadlock in __put_task_struct caused by changing the nr_dead under tasklist_lock.

Bug #74029.

original patch was diff-ve-nr-dead-atomic-20060310 from 2.6.18, its description:

# Fixed Kirill's (dev@) comment not tu use obfuscated macros.
# ----------------------------
# revision 1.1
# date: 2006/03/10 16:42:04;  author: xemul;  state: Exp;
# Do not take task_list_lock in put_task_struct do change nr_dead counter.
# Otherwise - deadlock:
# Mar 10 18:58:18 ts13  [<c0238b44>] __write_lock_debug+0xc4/0xf0
# Mar 10 18:58:18 ts13  [<c0238bcf>] _raw_write_lock+0x5f/0xa0
# Mar 10 18:58:18 ts13  [<c03da43c>] _write_lock_irq+0xc/0x10
# Mar 10 18:58:18 ts13  [<c011ee8d>] __put_task_struct+0x9d/0x180
# Mar 10 18:58:18 ts13  [<c011fefd>] sighand_free_cb+0x1d/0x30
# Mar 10 18:58:18 ts13  [<c013772c>] rcu_do_batch+0x2c/0x70
# Mar 10 18:58:18 ts13  [<c0137994>] rcu_process_callbacks+0x34/0x60
# Mar 10 18:58:18 ts13  [<c0129156>] tasklet_action+0x66/0xd0
# Mar 10 18:58:18 ts13  [<c0128da2>] __do_softirq+0xa2/0x130
# Mar 10 18:58:18 ts13  [<c0105aaf>] do_softirq+0x4f/0x60
# Putting of task structs is performed via rcu in 2.6.16 and sometimes
# tasklist_lock is taken w/o _irq.
# Replaces diff-ms-tasklistlock-irq-20060310


Patch from Denis: This patch corrects inet device initialization order to avoid partly initialized device.

Bug #73995.


Updated DRBD to version 0.7.22