Contents
Changes
- Rebase to RHEL5 8.1.4 kernel.
- Mainstream security fixes.
- Improvements, optimizations, fixes in most subsystems.
- DRBD update to 8.0.3.
- Xen/OpenVZ fixes to run RHEL5 in Dom0/U.
- Fixes for SPARC and PPC.
Config changes
Added:
- +
CONFIG_LEGACY_PTYS=y
- +
CONFIG_LEGACY_PTY_COUNT=256
- +
CONFIG_GFS_FS=m
- +
CONFIG_PREEMPT_VOLUNTARY=y
- +
CONFIG_PREEMPT_BKL=y
Removed:
CONFIG_UBC_DEBUG_KMEM
Patches
diff-arch-4gb-ldt-irqs-20070515
Patch from Kirill Korotaev <dev@openvz.org>
4GB split LDT reload fix from RHEL4u5
diff-cpt-features-known-mask-20070514
Patch from Andrey Mirkin <major@openvz.org>
[CPT] 2.6.9 <-> 2.6.18 features mask compatibility issue
Use VE_FEATURES_OLD mask for old (< 2.6.18 kernel) CPT images.
Bug #81468
diff-cpt-futex-eintr-20070510
Patch from Alexey Kuznetsov <alexey@openvz.org>
[CPT] too aggressive sys_futex() restart
Checkpointing used to enforce restart of sys_futex even when it returns -EINTR to workaround for sick return value of FUTEX_WAIT. Of course, this is wrong (f.e. it means restart of timed FUTEX_WAIT with original timeout :-(), but do not have much of choice if we do not want to break everything. At least one case can be relaxed. If we have signal pending, when we restore we must not restart. This pending signal would interrupt FUTEX_WAIT in any case. This fixes sem_wait()
diff-cpt-kill-external-processes-b-20070515
Patch from Andrey Mirkin <major@openvz.org>
We have a problem with external processes. If someone enters to VE forks and does some job w/o exec, then the process is not considered as external (pids are virtual), but some of the files (e.g. libs) can be from HN, i.e. external. Temporary and quick fix for this bug: On suspend kill processes which have mm->vps_dumpable == 0.
Bug #81722
diff-cpt-prevent-vm-changes-20070510
Patch from Alexey Kuznetsov <alexey@openvz.org>
[CPT] prevent changes of VM after VE was checkpointed
It is possible that processes' VM is changed after VE is checkpointed and killed. At the moment it will happen when a process set clear_parent_tid or robust list pointers. It was not considered a problem, because VM is about to be destroyed in any case. But one case was missed: corresponging VM areas could be mapped to a file. If it is not deleted, the change will reach file system and migrate. Oops. F.e. shared locked futex will be unlocked after migration. (glibc tst-robust8 test)
diff-cpt-suspend-cleanup-20070510
Patch from Alexey Kuznetsov <alexey@openvz.org>
[CPT] VE suspend cleanups
The patch fixes one bug. Sometimes one process sleeps in an uninterruptible state waiting for some event depending on another process, which could be suspended. I know three such cases: 1. Process did vfork() and waits when child will exec() 2. Thread did exec() and waits when its siblings will die. 3. Thread makes coredump and waits when siblings stop. We detected case #1 directly by looking at tsk->vfork_done. In another places suspend timed out and failed, which is obviously incorrect. It is possible to handle cases #2,3 like we did with vfork, but it is not necessary. The patch suggests universal solution: we split suspend to several shorter rounds: the first round tries to suspend for 200msec, if it fails, VE is unfreezed and suspend is retried after some time. We repeat the attempts with increasing timeout until VE is frozen or major timeout (10sec) expires. Besides that, the patch reorders suspend code, so that it becomes more or less readable.
diff-fairsched-dyn-vcpu-timeslice-20070518
Patch from Alexandr Andreev <aandreev@openvz.org>
[SCHED] optimization: dynamic vcpu_timeslice
vcpu_timeslice == -1 now has special meaning (and -1 is default value now). In this case, actual vcpu_timeslice value will depend on number of VCPU's ready to run: assume N = ready_vcpus / nr_pcpus for N <= 1, vcpu_timeslice will be 8 1 < N <= 2, vcpu_timeslice = 4 2 < N <= 3, vcpu_timeslice = 2 3 < N <= 4, vcpu_timeslice = 1 N > 4, vcpu_timeslice = 0 This patch lets significantly improve performance of 'context switch' test from unixbench-4.1.0-wht-1, when several instances of this test is running. On a host with 16 CPU's: # cd unixbench-4.1.0-wht-1 # echo 0 > /proc/sys/kernel/vcpu_timeslice # ./Run context1 16 108.4 # echo -1 > /proc/sys/kernel/vcpu_timeslice # ./Run context1 16 435.3
diff-fairsched-vcpuoff-comp-20070426
Patch from Alexandr Andreev <aandreev@openvz.org>
[SCHED] compilation fix in case CONFIG_SCHED_VCPU=n
This patch fixes compilation of OVZ kernel with CONFIG_SCHED_VCPU=n Note: VE can't be started in any case due to fairsched syscall's returns ENOSYS, but I fixed fairsched and checked that VE can be started/stopped - it looks like it works )).
diff-ms-cfq-allow-merge-c-20070507
Patch from Vasily Tarasov <vtaras@openvz.org>
[PATCH] merging of async requests was abit incorrectly backported
patch diff-ms-cfq-allow-merge-b-20070424 was ported a bit incorrectly. It resulted in wrong async requests merging.
Bug #80857
diff-ms-dcache-fix-quadratic-shrink
Patch from Alexey Dobriyan <adobriyan@openvz.org>
Backport of commit d52b908646b88cb1952ab8c9b2d4423908a23f11 Author: Miklos Szeredi <mszeredi@suse.cz> Date: Tue May 8 00:23:46 2007 -0700 fix quadratic behavior of shrink_dcache_parent() The time shrink_dcache_parent() takes, grows quadratically with the depth of the tree under 'parent'. This starts to get noticable at about 10,000. These kinds of depths don't occur normally, and filesystems which invoke shrink_dcache_parent() via d_invalidate() seem to have other depth dependent timings, so it's not even easy to expose this problem. However with FUSE it's easy to create a deep tree and d_invalidate() will also get called. This can make a syscall hang for a very long time. This is the original discovery of the problem by Russ Cox: http://article.gmane.org/gmane.comp.file-systems.fuse.devel/3826 The following patch fixes the quadratic behavior, by optionally allowing prune_dcache() to prune ancestors of a dentry in one go, instead of doing it one at a time. Common code in dput() and prune_one_dentry() is extracted into a new helper function d_kill(). shrink_dcache_parent() as well as shrink_dcache_sb() are converted to use the ancestry-pruner option. Only for shrink_dcache_memory() is this behavior not desirable, so it keeps using the old algorithm. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Maneesh Soni <maneesh@in.ibm.com> Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Additionally merged: commit 24c32d733dd44dbc5b9dcd0b8de58e16fdbeac7 From: Andrew Morton <akpm@linux-foundation.org> Date: Tue, 8 May 2007 07:23:49 +0000 (-0700) Subject: mm: shrink parent dentries when shrinking slab X-Git-Tag: v2.6.22-rc1~799 X-Git-Url: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=24c32d733dd44dbc5b9dcd0b8de58e16fdbeac76 mm: shrink parent dentries when shrinking slab Teach the dentry slab shrinker to aggressively shrink parent dentries when shrinking the dentry cache. This is done to attempt to improve the situation where the dentry slab cache gets a lot of internal fragmentation due to pages containing directory dentries. It is expected that this change will cause some of those dentries to be reaped earlier, and with less scanning. Needs careful testing. Cc: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Typical numbers after mkdir("foo")/chdir("foo") done N times and immediate "time vzctl stop" Before: N=32768 real 1m14.529s 1m16.602s 1m16.143s user 0m0.009s 0m0.014s 0m0.007s sys 1m4.569s 1m6.638s 1m7.187s After: real 0m10.078s 0m10.080s 0m10.079s user 0m0.007s 0m0.012s 0m0.012s sys 0m0.055s 0m0.053s 0m0.054s Less easy case for this patch is the following configuration *--*--*--* ... \ \ \ \ * * * * Speedup for this case is less rosy but significant anyway: L before after 4096 11.40s 9.75s 8192 24.00s 16.80s 65536 15m39.897s 5m29.738s
Bug #73640
diff-ms-futex-locking-bug-20070510
Patch from Ingo Molnar <mingo@elte.hu>
[PATCH] futex: PI state locking fix
commit 21778867b1c8e0feb567addb6dc0a7e2ca6ecdec Author: Ingo Molnar <mingo@elte.hu> Date: Fri Mar 16 13:38:31 2007 -0800 [PATCH] futex: PI state locking fix Testing of -rt by IBM uncovered a locking bug in wake_futex_pi(): the PI state needs to be locked before we access it. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Chuck Ebbert <cebbert@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff-ms-futex-oops-20070510
Patch from Alexey Kuznetsov <alexey@openvz.org>
[PATCH] PI futex oops (mainstream)
Serialization in PI futexes is severely broken, lots of bugs, lots. But only one is known which crashes kernel. It is possible that new pi state isadded to pi_state_list after the task did exit cleanup already. So that, when task struct is released pi_state list remains in corrupted state. Locally exploitable.
diff-ms-nfs-rm-warn-20070515
Patch from Neil Brown <neilb@suse.de>
[NFS] Remove warning: VFS is out of sync with lock manager
But keep it as a dprintk The message can be generated in a quite normal situation: If a 'lock' request is interrupted, then the lock client needs to record that the server has the lock, incase it does. When we come the unlock, the server might say it doesn't, even though we think it does (or might) and this generates the message. Signed-off-by: Neil Brown <neilb@suse.de> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> X-Git-Url: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=46bae1a9a767f3ae8e636d96f9b95703df34b398
diff-ms-slab-numa-bind-20070514
Patch from Alexandr Andreev <aandreev@openvz.org>
[SLAB] cache_reap() function must be binded, or take into account vcpus
it must pass actual (up-to-dated) numa node id to drain_cache() aftre cond-resched.
Bug #81234
diff-sysrq-debug-b-20070511
Patch from Alexandr Andreev <aandreev@openvz.org>
[SYSRQ] show correct sysrq help message.
Fix sysrq-h help message which was broken by SysRq-debugger patch.
Bug #81612
diff-ubc-cleanup-20070517
Patch from Denis Lunev <den@openvz.org>
[BC] cleanup: remove unused functions
Cleanup: remove a bit of unused code
diff-ubc-dentry-acentry-20070510
Patch from Alexandr Andreev <aandreev@openvz.org>
[BC] dcache: new style of array_cache entries access
cosmetics: new style of array_cache entries access
diff-ubc-dentry-free_alien-20070510
Patch from Alexandr Andreev <aandreev@openvz.org>
[BC] dcache: drain alien caches on nodes on dcache acct on
drain alien caches on nodes on dcache walking through dentry slabs lists when turning dcache accounting on/off
Bug #81116
diff-ubc-dentry-free_block-20070510
Patch from Alexandr Andreev <aandreev@openvz.org>
[BC] dcache: pass correct node number to kmem_cache_free_block()
Fix warning in slab_put_obj() in debug kernels due to incorrect node number passed to kmem_cache_free_block().
Bug #81116
diff-ubc-move-might-sleep-20070517
Patch from Alexey Dobriyan <adobriyan@openvz.org>
[PATCH] dcache: move might_sleep() from under preempt_disable()
dput() had might_sleep() check in the very beginning. After renaming to dput_recursive() and adding preempt_disable() call this check ended up in region with disabled preemption, so with CONFIG_DEBUG_SPINLOCK_SLEEP=y and preemption on dmesg gets heavily spammed. So, do might_sleep() check earlier.
diff-ubc-proc-rework-b-20070515
Patch from Pavel Emelianov <xemul@openvz.org>
[BC] rework /proc/bc code
Proc files creation suffered from two disadvantages: 1. It was racy in respect to bc remove/create 2. It couldn't correctly show hierarchical beancounters Rework showing BC info by overriding readdir and lookup methods for inodes under /proc/bc. The entry layout itself is kept unchanged. Plus create new entry names /proc/bc/<id>/debug to show BC's id, parent credentials and some in-kernel memory pointers purely for debugging. Signed-off-by: Pavel Emelianov <xemul@sw.ru>
diff-ubc-top-beancounter-20070515
Patch from Pavel Emelianov <xemul@openvz.org>
[BC] cleanup: introcude top_beancounter helper
Many resources are accounted to top beancounter only. Introduce a helper to make code look nicer. When saving the mapped page's ub need to save top beancounter as it is done for non-mapped write.
Bug #81224
Signed-off-by: Pavel Emelianov <xemul@sw.ru> Signed-off-by: Kirill Korotaev <dev@sw.ru>
diff-ve-cap-bset-20070504
Patch from Vasily Tarasov <vtaras@openvz.org>
[PATCH] sysctl: add VE capability boundary set sysctl
The user wishes to have virtualized kernel.cap-bound sysctl in order to use lcap tool to observe capabilities allowed in VE. We have cap_default field on ve_struct, that can be modified to be used as virtualized cap_bset. This patch: - renames cap_default -> ve_cap_bset - virtualizes cap_bset - introduces new proc and strategy routines to handle appropriate sysctl
diff-ve-init-signals-20070514
Patch from Denis Lunev <den@openvz.org>
[VE] VE init signal delivery reworked to be similar to host
Prevent VE init from receiving unexpected signals sent from VE including *fatal* ones. Signals sent from VE0 are still allowed, e.g. for fast VE stop. Fix for sys_reboot called from VE to force VE death (SIGKILL is sent in the context of VE).
diff-ve-net-bridge-via-phys-dev2-20070514
Patch from Dmitry Mishin <dim@openvz.org>
[BRIDGE] bridge deliver to original eth0 device
- now packets are input to the local system as they are coming from phys device only; - fixed bunch of bugs with VE <-> HN communications.
diff-ve-nf-ipt6-aliasing-20070515
Patch from Alexey Dobriyan <adobriyan@openvz.org>
[VE] unalias IPv6 iptables bit mask from IPv4
VE_IP_MANGLE flag is used as mask for both IPv4 and IPv6 modules which is no-no, because ip6table_mangle can be loaded after VE start. Split VE_IP_IPTABLES into VE_IP_IPTABLES and VE_IP_IPTABLES6. Same for VE_IP_FILTER and VE_IP_MANGLE. Choose numbers to not contradict with vzctl header. Temporarily mirror VE_IP_IPTABLES into IPv6 mask. When vzctl will start doing right thing, this mirroring can be dropped.
diff-ve-nfs-abortset-20070504
Patch from Denis Lunev <den@openvz.org>
[NFS] fix NFS auto-umount timeout sysctl
This patch: - changes the timeout units from jiffies to seconds - fixes assignment from userspace (was impossible, since UINT_MAX was treated as negative)
diff-ve-nfs-hostcache-20070510
Patch from Denis Lunev <den@openvz.org>
[NFS] virtualize NLM hosts cache
This patch virtualizes NLM hosts cache
Bug #74374
diff-ve-nfs-stop-20070502
Patch from Denis Lunev <den@openvz.org>
[NFS] shutdown NFS properly if hanged
This patch properly shutdown NFS if it is stalled.
diff-ve-nfs-stop-b-20070502
Patch from Kirill Korotaev <dev@openvz.org>
[NFS] Compilation fix for diff-ve-nfs-stop-b-20070502 when built as module
diff-ve-nfs-stop-20070502 requires some symbols to be exported. Signed-Off-By: Kirill Korotaev <dev@sw.ru>
diff-ve-pid-nr-fix-20070503
Patch from Pavel Emelianov <xemul@openvz.org>
When setting explicit vpid into ve's pidmap we need to
When setting explicit vpid into ve's pidmap we need to dec nr_free counter by one. This does not fix any BUG, it just make pidmap information consistent and hels to work faster when pidmap is full.
diff-ve-proc-hash-pid-dentries-20070516
Patch from Pavel Emelianov <xemul@openvz.org>
[PATCH] proc: don't hash task dentries in VE0
When task dies the proc dentries, that may be hashed are shrunk with shrink_dcache_parent(). The problem is that this routine doesn't guarantee that all the entries will be flushed and thus pid may still have reference from the appropriate inode. When we have such dentries in VE0 holding pids from ve this leads to pid leakage and inability to release the beancounter after ve stop. So don't hash such dentries - remove them immediately.
Bug #80025
Signed-off-by: Pavel Emelianov <xemul@sw.ru>
diff-ve-showmem-locking-20070414
Patch from Denis Lunev <den@openvz.org>
[PATCH] vmalloc info during OOM locking
vmlist_lock can't be held under any spin_lock which is help with IRQ. This assumption is always broken for __alloc_pages. Modified by Kirill: drop vprintstat() from show_mem() at all
Bug #81199
diff-vzdq-ppc32-comp-20070518
Patch from Vasily Tarasov <vtaras@openvz.org>
[PATCH] vzdquota: compilation fix for ppc32
While compiling on ppc32 the following error appears: Building modules, stage 2. MODPOST WARNING: "__cmpdi2" [fs/vzdquota.ko] undefined! make[2]: *** [__modpost] Error 1 The problem is that switch((long_long_var)) is not a primitive for ppc32 gcc: libgcc.a is needed, which is out of the kernel. The problem was noticed by mbaranzak user at forum and he found the reason of it.
diff-vzdq-putsuper-20070518
Patch from Denis Lunev <den@openvz.org>
[VZDQ] sb->put_super can be NULL in valid cases
put_super() superblock operation was not checked for NULL in vzquota leading to NULL dereference.
OpenVZ Bug #541
Bug #81936
diff-i2o-procread-20070509
Patch from Vasiliy (vvs@):
Fixes oops on read from some i2o proc files.
Fixes oops on read from some i2o proc files. Minor issue because i2o_proc module is not used currently.
diff-arch-4gb-xen-20070523
Patch from Evgeny Kravtsunov <emkravts@openvz.org>
[4GB-SPLIT] Fixes required for Xen kernel compilation
diff-cpt-mm-deadlock-20070523
Patch from Alexey Kuznetsov <alexey@openvz.org>
[CPT] Fix possible deadlock in checkpointing of mm
Learned it wrong once and did not relearn. anon_vma lock cannot be taken under page table lock. And it is taken and should be taken in reversed order, cpt_mm even has a special hack due to wrong understanding: look at chunk converting ugly spin_trylock to spin_lock. Difference of previous version: in one case (does not happen normally, but yet), page table lock could remain locked.
Bug #82785
diff-cpt-wait-fix-20070518
Patch from Andrey Mirkin <major@openvz.org>
While checkpointing due to memory shortage CPT processes
While checkpointing due to memory shortage CPT processes can be killed and tmpfs will not be saved. During restore we will see such errors: CPT ERR: e0000002ef9c5000,111 :-2 mounting /dev/pts devpts 40000000 CPT ERR: e0000002ef9c5000,111 :rst_namespace: -2
Bug #79854
This happens as /dev is tmpfs now and its content was not saved during checkpointing. We need to check exit status of tar and iptables-save to be sure that they exited normally. Changes from v1: - return -EINVAL in case of error
diff-cpt-xen-ldt-20070523
Patch from Evgeny Kravtsunov <emkravts@openvz.org>
[XEN] Fix LDT handling - There is one chunk LDT data only
diff-ms-cfq-rm-redundant-find-next-req-20070509
Patch from Vasily Tarasov <vtaras@openvz.org>
[PATCH] cfq: remove redundant cfq_find_next_crq() function call
mainstrem fix. cfq_find_next_crq() will be called later. This fix is incorporated in http://git.kernel.dk/?p=linux-2.6-block.git;a=commitdiff;h=21183b07ee4be405362af8454f3647781c77df1b
diff-ms-fuse-bug-control-fs-20070521
Patch from mainstream, prepared by Dmitry Monakhov <dmonakhov@openvz.org>
[PATCH 5/6] fuse: fix bug in control filesystem mount
The BUG in fuse_ctl_add_dentry() could be triggered if the control filesystem was unmounted and mounted again while one or more fuse filesystems were present. The fix is to reset the dentry counter in fuse_ctl_kill_sb(). Bug reported by Florent Mertens. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff-ms-fuse-dentry-parent-20070521
Patch from mainstream, prepared by Dmitry Monakhov <dmonakhov@openvz.org>
[PATCH 3/6] fuse: fix dereferencing dentry parent
There's no locking for ->d_revalidate, so fuse_dentry_revalidate() should use dget_parent() instead of simply dereferencing ->d_parent. Due to topology changes in the directory tree the parent could become negative or be destroyed while being used. There hasn't been any reports about this yet. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff-ms-fuse-mknod-of-regular-file-20070521
Patch from mainstream, prepared by Dmitry Monakhov <dmonakhov@openvz.org>
[PATCH] fuse: fix mknod of regular file
The wrong lookup flag was tested in ->create() causing havoc (error or Oops) when a regular file was created with mknod() in a fuse filesystem. Thanks to J. Cameijo Cerdeira for the report. Kernels 2.6.18 onward are affected. Please apply to -stable as well. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
diff-ms-fuse-nlookup-20070521
Patch from mainstream, prepared by Dmitry Monakhov <dmonakhov@openvz.org>
[PATCH 1/6] fuse: locking fix for nlookup
[PATCH 1/6] fuse: locking fix for nlookup An inode could be returned by independent parallel lookups, in this case an update of the lookup counter could be lost resulting in a memory leak in userspace. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff-ms-fuse-oops-in-lookup-20070521
Patch from mainstream, prepared by Dmitry Monakhov <dmonakhov@openvz.org>
[PATCH 4/6] fuse: fix Oops in lookup
Fix bug in certain error paths of lookup routines. The request object was reused for sending FORGET, which is illegal. This bug could cause an Oops in 2.6.18. In earlier versions it might silently corrupt memory, but this is very unlikely. These error paths are never triggered by libfuse, so this wasn't noticed even with the 2.6.18 kernel, only with a filesystem using the raw kernel interface. Thanks to Russ Cox for the bug report and test filesystem. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff-ms-fuse-spurious-bug-20070521
Patch from mainstream, prepared by Dmitry Monakhov <dmonakhov@openvz.org>
[PATCH 2/6] fuse: fix spurious BUG
Fix a spurious BUG in an unlikely race, where at least three parallel lookups return the same inode, but with different file type. This has not yet been observed in real life. Allowing unlimited retries could delay fuse_iget() indefinitely, but this is really for the broken userspace filesystem to worry about. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff-ms-fuse-validate-rootmode-20070521
Patch from mainstream, prepared by Dmitry Monakhov <dmonakhov@openvz.org>
[PATCH 6/6] [PATCH] fuse: validate rootmode mount option
If rootmode isn't valid, we hit the BUG() in fuse_init_inode. Now EINVAL is returned. Signed-off-by: Timo Savola <tsavola@movial.fi> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff-ms-powerpc-compilation-20070523
Patch from Kirill Korotaev <dev@openvz.org>
[PATCH] Fix powerpc compilation which was broken in 2.6.18.8
Christian Kaiser reported broken powerpc compilation due to 2.6.18.8 fix: http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.18.y.git;a=commitdiff;h=f102c840f7f72492a83c93fa65396fe0edcf1df6 In file included from drivers/media/video/pwc/pwc-uncompress.c:29: include/asm/current.h: In function get_current: include/asm/current.h:23: warning: implicit declaration of function offsetof include/asm/current.h:23: error: expected expression before struct make[4]: *** [drivers/media/video/pwc/pwc-uncompress.o] Error 1 make[3]: *** [drivers/media/video/pwc] Error 2 make[2]: *** [drivers/media/video] Error 2 make[1]: *** [drivers/media] Error 2 make: *** [drivers] Error 2
diff-ms-sparc64-compilation-20070523
Patch from Kirill Korotaev <dev@openvz.org>
Fix SBUS compilation on SPARC64.
asm/unistd.h should be included for _syscallX() define usage.
diff-ubc-dentry-free_alien-b-20070521
Patch from Alexandr Andreev <aandreev@openvz.org>
[BC] dentry_cache: drain alien caches only when CONFIG_NUMA is set
Fix oops introduced by previous patch diff-ubc-dentry-free_alien-20070510: We should deal with NUMA only when CONFIG_NUMA is set.
Bug #82721
diff-ubc-ioprio-on-dispatch-check-20070523
Patch from Vasily Tarasov <vtaras@openvz.org>
[BC] ioprio: account for requests in driver
Previously we didn't take into account beancounter's requests that are already in driver. Now we consider beancounter as empty (no requests in it) only if there are no requests in CFQ _and_ in driver. This patch improves fairness dramatically and fixes bug #81508
Bug #81508
diff-ubc-ioprio-range-prio-20070623
Patch from Vasily Tarasov <vtaras@openvz.org>
[BC] ioprio: range of ioprios were checked incorrectly
Fix ioprio range check: ioprio range is 0..7
diff-ubc-ioprio-slice-scaling-20070623
Patch from Vasily Tarasov <vtaras@openvz.org>
[BC] ioprio: BC slice scaling fix
Now slice is CFQ BC timeslice is scaled from X to 2*X ms.
diff-ve-net-veth-addr-macro-20070514
Patch from Evgeny Kravtsunov <emkravts@openvz.org>
[VE] rename local macro ADDR() to avoid conflict with Xen
Fix redefinition of ADDR in veth.c. macro with name ADDR is also used by Xen in include/asm/mach-xen/asm/synch_bitops.h.
diff-ve-nfs-abortcorrupt-20070523
Patch from Denis Lunev <den@openvz.org>
Unfortunately, counter on RPC client counts only clones, but not real
Unfortunately, counter on RPC client counts only clones, but not real ussage. cl_dead flag leads to client destruction in rpc_release_client while it is really in use. So, we have to introduce a new flag with the meaning similar to cl_dead in all places except rpc_release_client. True recounting is too expensive. Based on idea from Dmitry Monakhov.
Bug #82764
Bug #82875
diff-ve-nfs-tcpabort-20070522
Patch from Denis Lunev <den@openvz.org>
Add NFS timeout handle for TCP transport
diff-ve-xen-blktapmain-20070514
Patch from Evgeny Kravtsunov <emkravts@openvz.org>
Fixes compilation error in Xen driver
diff-vzdq-aquota-group-len-20070521
Patch from Alexey Dobriyan <adobriyan@openvz.org>
[VZDQ] Trivial fix for vzaquota.group lookup
"vzaquota.group" was memcmp'd for only 11 symbols, allowing names like "vzaquota.grouX" to be looked up successfully.
diff-vzdq-atomic-in-buildmntlist-20070521
Patch from Alexey Dobriyan <adobriyan@openvz.org>
[VZDQ] Fix GFP_KERNEL allocation under spin lock in vzdq_aquot_buildmntlist
Switch allocation from GFP_KERNEL to GFP_ATOMIC under vfsmount_lock in vzdq_aquot_buildmntlist()
diff-arch-4gb-xen-cleanup-20070528
Patch from Kirill Korotaev <dev@openvz.org>
[PATCH] Cleanup 4GB split/Xen modification regarding init_tss
diff-cpt-aux-task-list-20070604
Patch from Alexey Kuznetsov <alexey@openvz.org>
[CPT] additional protected VE task list
Nobody want this. But another ideas are absent until now, which could mean it is just impossible to do painlessly. Each task is enlisted on one more list (vetask_auxlist), which is protected with tasklist_lock and it is its only difference of normal ve task list, which is accessed by RCU rules.
diff-cpt-check-iptables-modules-20070604
Patch from Andrey Mirkin <major@openvz.org>
[PATCH] CPT: check if ip_talbes are enabled before dumping them
iptables-save returns error if module ip_tables is not loaded. So, we just do not need to dump iptables at all if this module is not loaded in VE. Don't try to dump iptables if they are not enabled in VE.
diff-cpt-compilation-warning-20070604
Patch from Alexey Kuznetsov <alexey@openvz.org>
[CPT] fix compilation warning due to cast u64 -> pointer
diff-cpt-compile-warning-20070604
Patch from Alexey Kuznetsov <alexey@openvz.org>
[CPT] one more compiler warning
diff-cpt-deleted-ref-20070502
Patch from Andrey Mirkin <major@openvz.org>
[PATCH] CPT: checkpoint inodes with deleted reference
Consider the following scenario: 1. Create file (file1) and hard link to it (file2) 2. Open file2 3. Unlink file2 After that during checkpointing we will have the following err: Can not dump VE: Device or resource busy deleted reference to existing inode, checkpointing is impossible The inode in question is not deleted, but it is not foundable from inside checkpointed process group and not easy foundable on the disk :/ So we are trying to find another dentry with the same inode in 2 common places: 1. In inode->i_dentry alias list 2. In dir in which deleted dentry itself is located
Bug #72540
diff-cpt-ia64-nat-20070604
Patch from Alexey Kuznetsov <alexey@openvz.org>
[CPT][IA64] save/restore NaT values
This patch closes two remaining holes in IA64 cpt implementation, both are of no immediate practical value, but neraly impossible to fix after we freeze layout of cpt image structs. 1. Migration between hosts with different layout of struct thread_info, which is possible, if some new bits are added to thread_info in newer kernels. 2. NaT bits are acurately saved and restored. This is required only when some application uses control speculative loads, current compilers are not able to do this, but this can change.
diff-cpt-ia64-unaligned-suppress-20070604
Patch from Alexey Kuznetsov <alexey@openvz.org>
[CPT][IA64] some prctl flags are forgotten
Some apps do lots of unaligned accesses, know about this and use prctl(). We did not save/restore those flags and got flood of warnings after checkpointing.
diff-cpt-improve-align-20070604
Patch from Alexey Kuznetsov <alexey@openvz.org>
[CPT] cosmetic fix to match 2.6.9 text
In 2.6.9 it was a critical bug, image would be corrupted because of broken alignment if we did not make this. In 2.6.18 it is just nice.
diff-cpt-inotify-core-b-20070529
Patch from Vasily Tarasov <vtaras@openvz.org>
Rework inotify changes, so that old API is still available
Rework inotify changes, so that old API is still available for 3rd party code like aufs.
diff-cpt-namespace-deadlock-20070604
Patch from Alexey Kuznetsov <alexey@openvz.org>
[CPT] namespace semaphore possible deadlock
Old bug. To my shame I knew about this, but ignored. Deadlock is possible in two cases: 1. tar is not a tar, but something maliciously doing mount/unmount 2. tar is a good tar, but it takes namespace semaphore for read f.e. to read /proc/mounts. If someone in system does mount/unmount and blocked taking write semaphore, the second read semaphore deadlocks. The fix is to drop namespace semaphore. We make mntget() on current mnt, so that it will not disappear from under us. It still can be unmounted with MNT_DETACH, in this case we cannot proceed with scanning mnt list and we must not: unmounting something inside VE while checkpointing is an obvious good reason to fail.
diff-cpt-pids-leak-20070525
Patch from Pavel Emelianov <xemul@openvz.org>
[PATCH] CPT: fix potential pid leak
When restoring the VE restore_one_signal_struct() can occasionally leak some pids.
Bug #82895
diff-cpt-restore-deleted-files-20070502
Patch from Andrey Mirkin <major@openvz.org>
[PATCH] CPT: restore deleted files (hardlink case)
The bug was here all the time, but it was never triggered as we never entered the following path on checkpointing: if (!IS_ROOT(d) && d_unhashed(d)) { struct file *parent; parent = iobj->o_parent; if (!parent || (!IS_ROOT(parent->f_dentry) && d_unhashed(parent->f_dentry))) { /* Inode is not deleted, but it does not * have references from inside checkpointed * process group. We have options: * A. Fail, abort checkpointing * B. Proceed. File will be cloned. * A is correct, B is more complicated */ /* Just as a hint where to create deleted file */ if (ino->i_nlink != 0) { eprintk_ctx("deleted reference to existing inode, checkpointing is impossible\n"); return -EBUSY; } } else { <<< HERE /* Refer to _another_ file name. */ err = cpt_dump_filename(parent, 0, ctx); if (err) return err; if (S_ISREG(ino->i_mode) || S_ISDIR(ino->i_mode)) dump_it = 0; } } So, in image file for deleted file we always had its content and never a reference to another file. The fix is straightforward: check the type of the object in the image and restore file content if needed.
diff-cpt-restore-lastpid-20070604
Patch from Alexey Kuznetsov <alexey@openvz.org>
[CPT] restore last pid
Sigh. I hoped it is not necessary. It is. bash goes insane when its children get not monotonic pids. The place where we store saved last pid is unusual. Violating tradition I extend one of cpt image structs. This should be ok: migration to older kernels will be prohibited, migration from old to new ones is OK.
diff-cpt-restore-packet-socket-20070606
Patch from Alexey Kuznetsov <alexey@openvz.org>
[CPT] restore packet socket
Binding of packet socket was skipped. One tricky bit: getsockname returns "real" sockaddr length and bind() does not accept real name, it wants sizeof(struct sockaddr_ll). Missing bits: - multicast list, incuding promisc status - statistics is not restored Enough for beginning, the rest requires surgery in core.
diff-cpt-rst-file-error-msg-20070601
Patch from Andrey Mirkin <major@openvz.org>
[PATCH] CPT: print file name when fail to open it
Print file name if we failed to open it. This information will be usefull for resolving problems.
Bug #83180
diff-cpt-ubc-adjust-on-restore-c-20070601
Patch from Andrey Mirkin <major@openvz.org>
[PATCH] CPT: adjust UBC limits before restoring processes
Move UBC limit adjustments in more appropriate place, where it is actually needed.
diff-cpt-wait-fix-c-20070601
Patch from Andrey Mirkin <major@openvz.org>
[PATCH] CPT: fix kernel_thread error code checks
Some versions of tar return non-zero error code if it was not possible to write warning message to stderr. So, we need to open /dev/null for it. But during restore we will face another problem - /dev is stored on tmpfs, so we are not able to open /dev/null and we need to create it. Also there is another bug which come to CPT code from mainstream kernel_thread helper. If our function returns an error (e.g. exec failed) it doesn't place correct exit code to edi register before calling do_exit.
Bug #83183
diff-fairsched-boot-cpu-20070530
Patch from Kirill Korotaev <dev@openvz.org>
[PATCH] sched: boot CPU can have non-zero ID (sparc)
I was blindly assuming that boot processor ID is always 0, which was not true on SUN4U machine where boot CPU has ID 1 and 2nd CPU has ID 0. Strange, but it is. So replace 0 with real processor id in the code.
diff-fairsched-iowait-fix-20070624
Patch from Vasily Tarasov <vtaras@openvz.org>
[PATCH] Fix iowait stats in VE0
2.6.18 OVZ kernels don't account iowait time, this value is always displayed as zero: $ cat /proc/stat | grep cpu cpu 1700 5 1818 11790110 0 60 204 0 cpu0 893 4 1020 5894818 0 56 174 0 cpu1 807 1 797 5895291 0 4 29 0 This happens since calculations usually happen in idle context. Actually there is no good definition of iowait for global VE0 context. And the whole iowait concept is arguable, but still, let's try to account as good as possible.
diff-fairsched-ppc-syscalls-fix3-20070525
Patch from Kirill Korotaev <dev@openvz.org>
[PATCH] ppc: fix screwed OVZ syscall numbers
Fix enumeration of OVZ syscall numbers on powerpc Thanks to Christian Kaiser for noticing this.
diff-fairsched-preempt-20070529
Patch from Vasily Tarasov <vtaras@openvz.org>
[PATCH] Fix debug messages when CONFIG_DEBUG_PREEMPT is used
If CONFIG_PREEMPT and CONFIG_DEBUG_PREEMPT are turned on, OpenVZ kernels produce a lot of similar messages: BUG: using smp_processor_id() in preemptible [00000001] code: <process>/<pid> caller is io_schedule+0x22/0x53 Call Trace: ... Two reasons of these messages: 1) we call smp_processor_id() from io_schedule/io_schedule_timeout without preemption disabled. minor, raw_smp_processor_id() should be used. 2) task_struct->cpus_allowed has mask of vcpus instead of pcpus. Therefore debug_smp_processor_id() function fails to check that the process can run only on one current cpu. The patch fixes both issues.
diff-fairsched-rename-vcpu-info-20070517
Patch from Evgeny Kravtsunov <emkravts@openvz.org>
[PATCH] rename vcpu_info to vcpu_struct due to conflict with Xen
Rename vcpu_info to vcpu_struct due to conflict with Xen which uses the same name for its data structure (sigh... globally...) Thanks to seyko2 for testing OVZ-Xen kernel.
diff-fairsched-tickduration-20070528
Patch from Alexandr Andreev <aandreev@openvz.org>
[PATCH] sched: fix up fairsched tick duration according to VCPU timeslice
With latest 'vcpu dynamic timeslice' patch we broke fairsched scheduler logic a bit, which assumed, that fairsched_schedule() must be called on each timer tick. New bigger fairsched timeslice was introduced: this value must be always >= vcpu timeslice
Bug #82969
diff-fairsched-x8664-show-regs-20070528
Patch from Kirill Korotaev <dev@openvz.org>
[PATCH] fairsched: fix VCPU info in show regs on x86-64
diff-ms-entropy-fix-a-20070530
Patch from Matt Mackall <mpm@elenic.com>
Add data from zero-entropy random_writes directly to output pools to
Add data from zero-entropy random_writes directly to output pools to avoid accounting difficulties on machines without entropy sources. Tested on lguest with all entropy sources disabled. Signed-off-by: Matt Mackall <mpm@elenic.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu>
diff-ms-entropy-fix-b-20070530
Patch from Matt Mackall <mpm@selenic.com>
random: fix seeding with zero entropy
Add data from zero-entropy random_writes directly to output pools to avoid accounting difficulties on machines without entropy sources. Tested on lguest with all entropy sources disabled. Signed-off-by: Matt Mackall <mpm@selenic.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> X-Git-Url: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=7f397dcdb78d699a20d96bfcfb595a2411a5bbd2
diff-ms-ext3-iread-brelse-20070601
Patch from Kirill Korotaev <dev@openvz.org>
[PATCH] ext3: lost brelse in ext3_read_inode()
One of error path in ext3_read_inode() leaks bh, since brelse is forgoten. Move brelese under bad_inode label, so that it is freed. Signed-Off-By: Kirill Korotaev <dev@openvz.org>
diff-ms-ext3-orhpan-list-corruption-20070531
Patch from Vasily Averin <vvs@openvz.org>
[PATCH] ext3: orphan list corruption on bad inodes
This patch fixes ext3 orphan list corruption due to bad inodes created in ext3_read_inode(). Trying to catch orhpan list corruption in OpenVZ we found the following debug messages in the logs: May 30 10:39:38 df-rs-l24 kernel: EXT3-fs warning (device sda6): ext3_unlink: Deleting nonexistent file (37901290), 0 Inode 00000101a15b7840: orphan list check failed! 00000773 6f665f00 74616d72 00000573 65725f00 06737270 66000000 616d726f ... Call Trace: [<ffffffff80211ea9>] ext3_destroy_inode+0x79/0x90 [<ffffffff801a2b16>] sys_unlink+0x126/0x1a0 [<ffffffff80111479>] error_exit+0x0/0x81 [<ffffffff80110aba>] system_call+0x7e/0x83 first messages says that unlinked inode has i_nlink=0, then ext3_unlink() adds this inode into orphan list. second message means that this inode has not been removed from orphan list, and inode dump shows that i_fop = &bad_file_ops Then I've discovered that bad_file_ops can be set only in make_bad_inode(). ext3_read_inode() can call make_bad_inode() without any error/warning messages in the following case: ... if (inode->i_nlink == 0) { if (inode->i_mode == 0 || !(EXT3_SB(inode->i_sb)->s_mount_state & EXT3_ORPHAN_FS)) { /* this inode is deleted */ brelse (bh); goto bad_inode; ... i.e when inode->i_nlink == 0 and !(EXT3_SB(inode->i_sb)->s_mount_state & EXT3_ORPHAN_FS) Bad inode can live some time, ext3_unlink can add it to orphan list then, but ext3_delete_inode() doesn't delete this inode from orphan list, since inode is bad. As a result we have orphan list corruption detected in ext3_destroy_inode(). This issue present in rhel4/rhel5/mainstream kernels too.
Bug #83419
diff-ms-ext3-orhpan-list-corruption-b-20070603
Patch from Vasily Averin <vvs@openvz.org>
[PATCH] ext3: orphan list corruption on bad inodes (v2)
Changes to previous patch: instead of fixing ext3_unlink() better fix all the paths were bad inode can be found and used, i.e. lookup() and get_parent() After ext3 orphan list check has been added into ext3_destroy_inode() (please see my previous patch) the following situation has been detected: EXT3-fs warning (device sda6): ext3_unlink: Deleting nonexistent file (37901290), 0 Inode 00000101a15b7840: orphan list check failed! 00000773 6f665f00 74616d72 00000573 65725f00 06737270 66000000 616d726f .. Call Trace: [<ffffffff80211ea9>] ext3_destroy_inode+0x79/0x90 [<ffffffff801a2b16>] sys_unlink+0x126/0x1a0 [<ffffffff80111479>] error_exit+0x0/0x81 [<ffffffff80110aba>] system_call+0x7e/0x83 First messages said that unlinked inode has i_nlink=0, then ext3_unlink() adds this inode into orphan list. Second message means that this inode has not been removed from orphan list. Inode dump has showed that i_fop = &bad_file_ops and it can be set in make_bad_inode() only. Then I've found that ext3_read_inode() can call make_bad_inode() without any error/warning messages, for example in the following case: .. if (inode->i_nlink == 0) { if (inode->i_mode == 0 || !(EXT3_SB(inode->i_sb)->s_mount_state & EXT3_ORPHAN_FS)) { /* this inode is deleted */ brelse (bh); goto bad_inode; .. Bad inode can live some time, ext3_unlink can add it to orphan list, but ext3_delete_inode() do not deleted this inode from orphan list. As result we can have orphan list corruption detected in ext3_destroy_inode(). However it is not clear for me how to fix this issue correctly. As far as i see is_bad_inode() is called after iget() in all places excluding ext3_lookup() and ext3_get_parent(). I believe it makes sense to add bad inode check to these functions too and call iput if bad inode detected. Signed-off-by: Vasily Averin <vvs@sw.ru>
diff-ms-ia64-nat-ptrace-20070604
Patch from Alexey Kuznetsov <alexey@openvz.org>
[IA64] ptrace returns garbage for NaT bits
An old bug. Nobody needed those NaT bits, so that it was not noticed.
diff-ms-net-ipv6-privacy-msg-20070605
Patch from Vasily Averin <vvs@openvz.org>
[PATCH] disable "Disabled Privacy Extensions" messages
Hide annoing useless messages about disabled IPv6 privacy extensions, which is always triggered by loopback: "lo: Disabled Privacy Extensions"
Bug #83651
diff-ms-net-settimeout-20070525
Patch from Vasily Averin <vvs@openvz.org>
[NET]: "wrong timeout value" in sk_wait_data() v2
sys_setsockopt() do not check properly timeout values for SO_RCVTIMEO/SO_SNDTIMEO, for example it's possible to set negative timeout values. POSIX do not defines behaviour for sys_setsockopt in case negative timeouts, but requires that setsockopt() shall fail with -EDOM if the send and receive timeout values are too big to fit into the timeout fields in the socket structure. In current implementation negative timeout can lead to error messages like "schedule_timeout: wrong timeout value". Proposed patch: - checks tv_usec and returns -EDOM if it is wrong - do not allows to set negative timeout values (sets 0 instead) and outputs ratelimited information message about such attempts. Signed-off-By: Vasily Averin <vvs@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net> X-Git-Tag: v2.6.22-rc3 X-Git-Url: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=ba78073e6f70cd9c64a478a9bd901d7c8736cfbc;hp=c883f215a23a9352097b8d17fb8dae22ff134a14
diff-ms-nfs-launder-20070530
Patch from Alexandr Andreev <aandreev@openvz.org>
[PATCH] NFS: Fix race in nfs_release_page()
invalidate_inode_pages2() may find the dirty bit has been set on a page owing to the fact that the page may still be mapped after it was locked. Only after the call to unmap_mapping_range() are we sure that the page can no longer be dirtied. In order to fix this, NFS has hooked the releasepage() method and tries to write the page out between the call to unmap_mapping_range() and the call to remove_mapping(). This, however leads to deadlocks in the page reclaim code, where the page may be locked without holding a reference to the inode or dentry. Fix is to add a new address_space_operation, launder_page(), which will attempt to write out a dirty page without releasing the page lock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Also, the bare SetPageDirty() can skew all sort of accounting leading to other nasties. [akpm@osdl.org: cleanup] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> From Alexandr: This 'new' invalidate/release logic also fixes our problem with mmap/write/read data corruption when several processes use the same mmaped file on NFS
Bug #81896
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e3db7691e9f3dff3289f64e3d98583e28afe03db
diff-ms-nfs-odirect-20070529
Patch from Denis Lunev <den@openvz.org>
[PATCH] nfs: oops during LTP over NFS (direct io)
Problem reported by Denis Lunev and QA, fix from mainstream incorrect comparison of "int" and "unsigned int" variables is fixed in nfs_direct_read_schedule and nfs_direct_write_schedule.
Bug #81589
diff-ms-nfs-schedlock-20070530
Patch from Denis Lunev <den@openvz.org>
[PATCH] nfs: AB-BA deadlock on rpc_sched_lock/queue->lock locks
This patch fixes possible AB-BA deadlock for rpc_sched_lock/queue->lock in rpc_run_child(). Normal sequence is presented in rpc_set_active: - rpc_sched_lock goest first - queue->lock is nested.
Bug #82518
diff-ms-nfs-umount-refcnt-leak-20070530
Patch from Trond Myklebust <Trond.Myklebust@netapp.com>
[PATCH] nfs: fix req refcnt leak preventing umount
Original Denis Lunev analyses: - nfs_direct_req_alloc creates dreq with dreq->kref->refcount == 2 - on success path the kref_put is called in nfs_direct_read_schedule -> nfs_direct_complete and in nfs_direct_wait - on error path only first put occured The same problem occures on direct_write path Mainstream patch version from Trond Myklebust <Trond.Myklebust@netapp.com>: The current code is leaking a reference to dreq->kref when the calls to nfs_direct_read_schedule() and nfs_direct_write_schedule() return an error. Thanks to Denis V. Lunev for spotting the bug and proposing the original fix. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
diff-ms-security-cpuset-20070605
Patch from Akinobu Mita <akinobu.mita@gmail.com>
use simple_read_from_buffer in kernel/
Cleanup using simple_read_from_buffer() for /dev/cpuset/tasks and /proc/config.gz. Cc: Paul Jackson <pj@sgi.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> X-Git-Tag: v2.6.22-rc1 X-Git-Url: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=85badbdf5120d246ce2bb3f1a7689a805f9c9006
diff-ms-security-sctp-20070604
Patch from Patrick McHardy <kaber@trash.net>
[NETFILTER]: {ip,nf}_conntrack_sctp: fix remotely triggerable NULL ptr dereference
When creating a new connection by sending an unknown chunk type, we don't transition to a valid state, causing a NULL pointer dereference in sctp_packet when accessing sctp_timeouts[SCTP_CONNTRACK_NONE]. Fix by don't creating new conntrack entry if initial state is invalid. Noticed by Vilmos Nebehaj <vilmos.nebehaj@ramsys.hu> CC: Kiran Kumar Immidi <immidi_kiran@yahoo.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
diff-ms-seqfile-seek-20070601
Patch from Alexey Dobriyan <adobriyan@openvz.org>
[PATCH] seqfile: bash can hang in a loop reading from proc file
Original problem: in some circumstances seq_file interface can present infinite proc file to the following script when normally said proc file is finite: while read line; do [do something with $line] done </proc/$FILE bash, to implement such loop does essentially read(0, buf, 128); [find \n] lseek(0, -difference, SEEK_CUR); Consider, proc file prints list of objects each of them consists of many lines, each line is shorter than 128 bytes. Two objects in list, with ->index'es being 0 and 1. Current one is 1, as bash prints second object line by line. Imagine first object being removed right before lseek(). traverse() will be called, because there is negative offset. traverse() will reset ->index to 0 (!). traverse() will call ->next() and get NULL in any usual iterate-over-list code using list_for_each_entry_continue() and such. There is one object in list now after all... traverse() will return 0, lseek() will update file position and pretend everything is OK. So, what we have now: ->f_pos points to place where second object will be printed, but ->index is 0. seq_read instead() of returning EOF, will start printing first line of first object every time it's called, until enough objects are added to ->f_pos return in bounds. Fix is to update ->index only after we're sure we saw enough objects down the road. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Bug #82819
diff-ubc-proc-rework-c-20070604
Patch from Kirill Korotaev <dev@openvz.org>
[PATCH] ubc: fix compilation with CONFIG_UBC_DEBUG_IO=y
During rework of UBC /proc compilation with UBC_DEBUG_IO was broken a bit.
diff-ubc-unix-exports-20070604
Patch from Kirill Korotaev <dev@openvz.org>
[PATCH] ubc: export ubc helpers for case CONFIG_UNIX=m
Export ub_sock_getwres_other, since unix sockets can call it from the module (unix.ko) when CONFIG_UNIX=m. Thanks to Rafael Isturiz for having non-standart config :) and reporting this.
diff-ve-cpustats-20070528
Patch from Kirill Korotaev <dev@openvz.org>
[PATCH] VE cpu stats should be exported to user space in clocks
VE cpu stats should be exported to user space in clocks intead of jiffies.
diff-ve-ip-nat-aliasing-20070605
Patch from Alexey Dobriyan <adobriyan@openvz.org>
[PATCH] Unalias VE_IP_NAT for ip_nat and iptable_nat modules
If ip_nat and ip_tables modules are loaded before VE start, and iptable_nat after VE start, on VE stop kernel will crash in ipt_unregister_table() attempting to unregister NULL table. Split VE_IN_NAT flag responsible for two modules.
diff-ve-net-arp-set-perms-20070625
Patch from Vasily Tarasov <vtaras@openvz.org>
[PATCH] arp: allow set arp cache entries from VE
It is secure since later we use __dev_get_by_name() function which is aware about current context. http://forum.openvz.org/index.php?t=tree&th=2570&mid=13209&&rev=&reveal=
diff-ve-net-veth-filtering-b-20070605
Patch from Andrey Mirkin <major@openvz.org>
[PATCH] veth: rework VE traffic filtering
Mac filtering in veth_xmit() was a bit incorrect: broadcasts and multicasts were allowed from VE. Rearrange code, make it more clear and assymetric :/
diff-ve-net-veth-multicast-20070604
Patch from Kirill Korotaev <dev@openvz.org>
[PATCH] veth: multicasts should be forwarded as well
Right now veth_xmit passes broadcasts only. It is a bug. Multicasts should be allowed as well. Thanks to Daniel Pittman for noticing this.
diff-ve-oom-adjust-20070604
Patch from Denis Lunev <den@openvz.org>
[PATCH] disable OOM_DISABLE inside VE
Prevent disabling of OOM from inside VE. Basically, it is safe to allow priority changes inside VE, as in normal case we select UB and a process inside UB then.
diff-ve-reparent-threaded-init-20070604
Patch from Alexey Kuznetsov <alexey@openvz.org>
[PATCH] VE: reparent threaded init correctly
If init is multithreaded (yes, imagine, this happens :-)), its threads are reparented to VE init, so that we get parents in the same thread group. Nothing especially bad happens, only checkpointing cannot restore such sick configuration.
diff-ve-setattr-proc-20070524
Patch from Alexey Dobriyan <adobriyan@openvz.org>
[PATCH 1/2] VE: allow proc setattr on local proc entries
If PDE is local to VE, there is no reason to not allow setattr on it -- changes won't affect corresponding global PDE and other VEs.
diff-ve-setattr-proc-b-20070604
Patch from Alexey Dobriyan <adobriyan@openvz.org>
[PATCH] proc: brown paper bag bug in proc's ->setattr
->setattr is called for something innocent like mtime updates, so outright banning of ->setattr on global proc entries was sadistic. Check if ->setattr is called with mask indicating MODE, UID, GID change and check for globalness only in this case.
diff-ve-setattr-proc-kmsg-20070524
Patch from Alexey Dobriyan <adobriyan@openvz.org>
[PATCH] VE: make /proc/kmsg to be VE local
Some people used to doing "chmod g+r /proc/kmsg". Make PDE corresponding to /proc/kmsg local to VE, so it's possible to setattr it.
diff-ve-syslog-20070601
Patch from Vitaliy Gusev <vgusev@openvz.org>
Fix LTP test failure in syslog test.
LTP failure is minor and simple: it calls syslog(2) with wrong arguments and awaiting for an error. But syslog() returns 0 since VE doesn't have real console and console loglevel. Thanks Christian Kaiser2 <CKAISER2@de.ibm.com> for noticing this.
diff-ve-vpsdumpable-early-20070604
Patch from Kirill Korotaev <dev@openvz.org>
[PATCH] init vps_dumpable early on exec
Since CPT uses vps_dumpable flag now for determining external processes on checkpointing, we need to initialize it earlier on mm creation on exec. Otherwise it can race.
diff-vzdq-restore-symlinks-under-sem-20070524
Patch from Alexey Dobriyan <adobriyan@openvz.org>
[PATCH] VZDQ: Fix lockdep warning about s_umount dependancy
Lockdep learns false dependency due to vz_restore_symlink() and later complains about possible circular locking when quotaon is done. Temporarily up ->s_umount semaphore to workaround this.
diff-xen-subarch-changes-20070528
Patch from Evgeny Kravtsunov <emkravts@openvz.org>
[PATCH] Fixes for Xen arch compilation / work
diff-ve-prepare-ve0-tasks-20070608
Patch from Alexandr Andreev <aandreev@openvz.org>
[PATCH] VE: ve0 processes intialization
VE0 processes were initialized twice:
- in copy_process()
- in prepare_ve0_process() from init_ve_system()
This is redundant and unneeded. Leading to wrong ve0.pcounter