Contents
Changes
- Mincore security fix (CVE-2006-4814)
- iptables compat mode fixes
- More than 512 IPs fix
- Modules unloading fixes
- Memory leak in kmemsize fixed
- 2 minor mainstream fixes
- Fixed tty restore on CPT
- Removed vmrss accounting
- Updated DRBD to 0.7.22.
Configs
Same as 023stab037.3, plus:
- +CONFIG_JFS_FS=m
- +CONFIG_JFS_POSIX_ACL=y
- +CONFIG_XFS_FS=m
- +CONFIG_XFS_QUOTA=y
- +CONFIG_XFS_POSIX_ACL=y
- +CONFIG_LOCK_HARNESS=m
- +CONFIG_GFS_FS=m
- +CONFIG_LOCK_NOLOCK=m
- +CONFIG_LOCK_DLM=m
- +CONFIG_LOCK_GULM=m
- +CONFIG_CLUSTER=m
- +CONFIG_CLUSTER_DLM=m
- +CONFIG_CLUSTER_DLM_PROCLOCKS=y
Patches
diff-ve-fibhashfree-20061221
Patch from Vasily: fib_hash_free() called from fini_ve_route() uses wrong size argument, and leads to oops in kfree() when too many IP were assigned to VE. Bug #73426.
diff-*proc-owner-20061218
diff-ve-proc-owner-20061218,
diff-cpt-proc-owner-20061218,
diff-vzdq-proc-owner-20061218
Patches from Evgeny (ekravtsunov@sw.ru), modified by Kirill:
create proc entries from module is dangerous thing. de->owner should be set atomically, though no one in mainstream does so. To set owner atomically we can protect against the race with proc_lookup() using lock_kernel().
Bug #73019.
diff-sysrqkey-scancode-20061121
Patch from Alexandr Andreev:
This patch lets you to change the SysRq key in Alt+SysRq+XXX combination with any other key:
# echo NEW_SCANCODE > /proc/sys/kernel/sysrq-key
You can get scancodes of your keyboard with programs like showkey or evtest. The default Alt+SysRq combination still works after redifinition.
diff-jbd-unexpectdirty-20060905
Patch from linux mainstream, prepared by Vasily:
http://linux.bkbits.net:8080/linux-2.6/gnupatch@431f7f0ceyo6g8tikQvG3I-cCSb7kw
"attached patch should fix the following race:
  Proc 1                               Proc 2
  __flush_batch()
    ll_rw_block()
				     do_get_write_access()
				   lock_buffer
					  jh is only waiting for checkpoint
				     -> b_transaction == NULL ->
					     do nothing
					   unlock_buffer
    test_set_buffer_locked()
    test_clear_buffer_dirty()
					   __journal_file_buffer()
					change the data
    submit_bh()
and we have sent wrong data to disk... We now clean the dirty buffer flag under buffer lock in all cases and hence we know that whenever a buffer is starting to be journaled we either finish the pending write-out before attaching a buffer to a transaction or we won't write the buffer until the transaction is going to be committed.
The test in jbd_unexpected_dirty_buffer() is redundant - remove it. Furthermore we have to clear the buffer dirty bit under the buffer lock to prevent races with buffer write-out (and hence prevent returning a buffer with IO happening).
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>"
Bug #68106.
diff-ve-vpid-leak
Patch from Alexey: [PATCH] leakage of vpid_mapping
Probably this fixes bug #62834.
The problem was that when switching to sparse VPID mappings, we could have processes with non-virtual pids entered to VE. F.e. it could be some stuck process from VE setup scripts. In this case we created useless mapping struct, which was nevere freed, because it referred to non-virtual pid.
I left a printk() in the code, because we definitely need confirmation that this event really happens. It does not in my tests: to the moment I run 400000 checkpoint/restores and 20000 of migrations on VE and I found no problems, unfortunately.
dev@: somehow was not ported from 2.6.8-022stab078.x branch
diff-fairsched-assert-20060602
Patch by Andrey (saw@): This patch fixes assertions in fairsched to avoid printk deadlocks, and to print more information.
diff-ve-root-user-20060605
Patch from Vasily: in some places we should compare not with &root_user ptr (HN root), but with VE root. Resulted in inability of su to change user when ulimit was too tight for root.
dev@: somehow was not ported from 022stab078.x branch...
diff-ms-mincore-locking-20061218
Patch from mainstream: [SECURITY]: Deadlock in mincore (CVE-2006-4814)
Marcel Holtman reported that sys_mincore() implementation has incorrect locking: copy_to_user() shouldn't be done under mmap_sem.
The whole security thread resulted in Linus idea to rewrite the code due to its being crap, but still the following patches were commited for the beginning instead of the rewrite patch:
GIT: 2f77d107050abc14bc393b34bdb7b91cf670c250
GIT: 4fb23e439ce09157d64b89a21061b9fc08f2b495
GIT: 825020c3866e7312947e17a0caa9dd1a5622bafc
Bug #73299.
diff-ms-nf-ipt-compat-offsets-20061218
Patch from Dmitry: compat offsets should be 'unsigned int' as entries array size has this dimension.
Bug #73201.
diff-fairsched-starttime-20061214
Patch from Alexandr Andreev:
- Update vcpu->start_time if we decided to stay on the previous vcpu This improves performance a little, if the schedule() is called too often.
- Set type of start_time to unsigned long to let it be in one scale with jiffies.
diff-ms-stopmachine-up-20060827
Patch from mainstream: [PATCH] Remove redundant up() in stop_machine()
An up() is called in kernel/stop_machine.c on failure, and also in the caller (unconditionally).
Signed-off-by: Zhou Yingchao <yingchao.zhou@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
GIT: 4edb9a143e31d2e191c199262226e1a5923ff8f7
http://linux.bkbits.net:8080/linux-2.6/gnupatch@44f1de0eQHQmbszw5F8_Z8enqg1ihw
diff-ubc-ia64-ptecharge-20061215
Patch from Alexey:
Alexey found that IA64 doesn't charge PTEs to kmemsize.
diff-ve-venet-stop-20061215
Patch from Denis:
- module reference counting fixed: initialization of ve->veip in veip_start() should take module ref counter
- unregister_netdev(venet) moved to venet_stop(), otherwise race: vznet can be unloaded before vecalls succeed to unregister all net devices
- veip_cleanup() cleanup venet structures on unloading
Bug #72973
diff-dput-preempt
Patch from Alexey: [PATCH] dput() exists with disabled preemption Both 2.6.9 and 2.6.18, and even 2.6.8.
diff-cpt-check-external-mounts
Patch from Andrey:
We can't restore external bind mounts, so we should check if they already exist (mounted by vzctl mount scripts) on restore process.
diff-cpt-add-extfs-20061208
Patch from Vasiliy: ext2/ext3 filesystems are not recognized by CPT now, consequently bind mount migration fails. This patch adds these filesystems.
diff-ve-cmdline-quiet-20061208
Patch from Alexandr Andreev: Add "quiet" to /proc/cmdline inside VE. Bug #54370.
diff-fs-symlink-err-fix
Patch from Dmitry (dmonakhov@):
page_symlink() ignore commit_write() ret value. page_symlink() check only prepare_write() ret value, but ignore ret value from commit_write(). This is not good because commit_write() may fail too, especially in case of any delayed allocations (ext3-pgfault patches).
Bug #72993.
BTW recent kernels check commit_write() ret value since 2.6.17-rc1:
http://lkml.org/lkml/2006/3/12/178
diff-cpt-debug-printk-20061213
Patch from Andrey:
Some messages in CPT code should be printed only when debug is turned on.
Bug #73174.
diff-simfs-free-blkdev-20061127
Patch from Kirill:
- fix simfs bdev setting
- beautify code a bit
Bug #72938.
diff-ve-venet-stop-b-20061227
Patch from Denis: This patch fixes memory leak introduced by diff-venet-vestop-20061213
Bug #73679.
diff-ms-linger-timeout
Patch from mainstream: [NET]: Make sure l_linger is unsigned to avoid negative timeouts
One of my x86_64 (linux 2.6.13) server log is filled with :
schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca
This is because some application does a
struct linger li; li.l_onoff = 1; li.l_linger = -1; setsockopt(sock, SOL_SOCKET, SO_LINGER, &li, sizeof(li));
And unfortunately l_linger is defined as a 'signed int' in include/linux/socket.h:
struct linger {
	int l_onoff;        /* Linger active                */
	int l_linger;       /* How long to linger for       */
};
I don't know if it's safe to change l_linger to 'unsigned int' in the include file (It might be defined as int in ABI specs)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GIT: 9261c9b042547d01eeb206cf0e21ce72832245ec
http://linux.bkbits.net:8080/linux-2.6/cset@1.3332.271.3
Bug #73688.
diff-ms-aio-nrpages-20061228
Patch from mainstream, prepared by Kostja:
This patch fixes the crash caused by incorrect initialization of "nr_pages" in aio. We should not claim to have filled in the ring_pages[] array until we actually _do_ fill it in. It will confuse the code that frees the structure if we claim there are pages there that don't exist.
http://linux.bkbits.net:8080/linux-2.6/gnupatch@418e67e3jfC3msWLXzcdTkI10dwtEg
Bug #73878.
diff-ms-emt64-dblfault-debug-20070115
Patch from Denis:
This patch adds thread_info debug for double fault handler on x8664.
diff-cpt-dentryopen-error-path
Patch from Alexey:
[CPT] oops in an error path
When we were not able to reopen file for read, ERR_PTR() was used as alive file pointer.
Damn, it _happened_. And I cannot recollect how and why.
diff-cpt-tty-restore-20070115
Patch from Andrey (amirkin@), backported from 2.6.16 patch:
TTY_LDISC flag was omitted on restore leading to messages: "kernel: init_dev but no ldisc" After that process hangs in D state. In some cases this can leads to node crash.
Bug #74039.
diff-ubc-vmrss-remove
Patch from Pavel:
Per-vma RSS accounting is (was) needed for debugging of privvmpages accounting only, but it produces more headache that help.
Privvmpages leak MUCH rarely and these cases can be debugged without this accounting so just remove this at all.
diff-ve-contextrestore-20070109
Patch from Denis:
Context on the task can be corrupted on memory allocation failure
diff-ubc-contextrestore-20070109
Patch from Denis:
Context on the task can be corrupted on allocation failure. Possible fix for bug #74031.
diff-ve-neigh-tbl-init-20070109
Patch from Vasily: fixes memory and ub leaks in neigh_table_init() corrected version, prevent access to unitialized hash_buckets and phash_buckets.
Bug #74067.
diff-ms-nf-compat-redir-20070111
Patch from Dmitry: added compat function to ipt_REDIRECT in order to get it working in 32bit VEs over 64bit node.
Bug #74179.
diff-ve-nr-dead-atomic-20070108
Patch from Pavel (xemul@), ported by Kostja:
This patch eliminates the selfdeadlock in __put_task_struct caused by changing the nr_dead under tasklist_lock.
Bug #74029.
original patch was diff-ve-nr-dead-atomic-20060310 from 2.6.18, its description:
# Fixed Kirill's (dev@) comment not tu use obfuscated macros. # ---------------------------- # revision 1.1 # date: 2006/03/10 16:42:04; author: xemul; state: Exp; # Do not take task_list_lock in put_task_struct do change nr_dead counter. # Otherwise - deadlock: # Mar 10 18:58:18 ts13 [<c0238b44>] __write_lock_debug+0xc4/0xf0 # Mar 10 18:58:18 ts13 [<c0238bcf>] _raw_write_lock+0x5f/0xa0 # Mar 10 18:58:18 ts13 [<c03da43c>] _write_lock_irq+0xc/0x10 # Mar 10 18:58:18 ts13 [<c011ee8d>] __put_task_struct+0x9d/0x180 # Mar 10 18:58:18 ts13 [<c011fefd>] sighand_free_cb+0x1d/0x30 # Mar 10 18:58:18 ts13 [<c013772c>] rcu_do_batch+0x2c/0x70 # Mar 10 18:58:18 ts13 [<c0137994>] rcu_process_callbacks+0x34/0x60 # Mar 10 18:58:18 ts13 [<c0129156>] tasklet_action+0x66/0xd0 # Mar 10 18:58:18 ts13 [<c0128da2>] __do_softirq+0xa2/0x130 # Mar 10 18:58:18 ts13 [<c0105aaf>] do_softirq+0x4f/0x60 # Putting of task structs is performed via rcu in 2.6.16 and sometimes # tasklist_lock is taken w/o _irq. # # Replaces diff-ms-tasklistlock-irq-20060310
diff-ms-net-indev-init-20070105
Patch from Denis: This patch corrects inet device initialization order to avoid partly initialized device.
Bug #73995.
linux-2.6.9-drbd-0.7.20-0.7.22.patch
Updated DRBD to version 0.7.22