Download/kernel/rhel4/023stab030.1/changes

From OpenVZ Virtuozzo Containers Wiki
Jump to: navigation, search

Contents

Changes

  • Rebased on 2.6.9-42.0.3EL
  • Driver updates, configs synchronization with RHEL
  • CPT fixes
  • Mainstream updates
  • A lot of virtualization enhancements (sysctls, sysfs, stats)
  • OOM-killer fixes
  • VE cleanup speedup.

Config changes

A lot of changes due to:

  • attempt to approximate configs to RedHat ones
  • driver updates to match HCL of 2.6.8 branch

Driver updates

diff-drv-adp94xx-freeze-20060906

Patch from Kostja:
This patch fixes kernel compilation with CONFIG_SCSI_ADP94XX=y by removing uses of PF_FREEZE flag in adp94xx driver.

diff-wrn-implicit-funcs-20060906

Patch from Kostja:
This patch removes wanings "implicit declaration of function" during compilation on x86 and x86_64 arches.

linux-2.6.9-e1000-7.2.7.patch

Patch ported by Kostja (khorenko@):
e1000 driver updated up to 7.2.7 version sources were taken from sourceforge.net/projects/e1000

Bug #19952.

linux-2.6.9-r8169-2.2.patch-1

Patch ported by Kostja (khorenko@):
r8169 driver updated up to 2.2 version sources were taken from 2.6.8-022stab078.20 vz kernel.

Bug #19950.

linux-2.6.9-sk98lin-8.31.2.3.patch

Patch ported by Kostja (khorenko@):

sk98lin driver updated up to version 8.31.2.3 sources were taken from skd.de

Bug #28918.

linux-2.6.9-sky2-1.4.patch

Patch ported by Kostja (khorenko@):
sky2 driver updated up to version 1.4

sources were taken from 2.6.8-022stab078.20 vz kernel

Bug #19950.

linux-2.6.9-qla4xxx-5.00.02.patch

Patch ported by Kostja (khorenko@):
qla4xxx driver updated up to version 5.00.02

sources from Qlogic's site.

Bug #27641.

linux-2.6.9-arcmsr-1.20.0X.12.patch

Patch ported by Kostja (khorenko@):
Areca driver v1.20.0X.12 added.

Sources are from Areca site

Bug #59933.

linux-2.6.9-dell_rbu-0.9.patch

Patch ported by Kostja (khorenko@):
dell_rbu driver updated to 0.9 version.

sources from Dell site.

Bug #55618.

linux-2.6.9-aoe-14.patch

Patch ported by Kostja (khorenko@):
AoE driver version 14 added; sources from site

Bug #51009.

linux-2.6.9-dpt_i2o-2.5.0-2426.patch

Patch ported by Kostja (khorenko@):

added alternative driver for I2O hardware, version 2.5.8, build 2426 sources taken from Mark Salyzyn,

!!! Obsoletes diff-drv-dpt-entropy-20040525 !!!

Bug #68066.

linux-2.6.9-i2o-1.325.patch

Patch ported by Kostja (khorenko@):
updates i2o layer,

backported from to 2.6.17 linux mainstream kernel by Vasily (vvs@)

Bug #68066.

diff-scsi-megaraid-dma64-20060621

Patch from Vasily:

this patch prevent enable of 64-bit DMA on the Megaraid SATA 150-4 controller because of it does not support 64-bit DMA.

Bug #52530.

linux-2.6.9-qla2xxx-8.01.05.patch

Patch prepared by Kostja:
Qlogic qla2xxx driver updated up to 8.01.05 version.

Sources from site

Bug #27641.

Patches

diff-cpt-annoying-printk

Patch from Alexey:
[CPT] remove annoying printk

In 2.6.9 printk("=") in refrigerator() is commented out. We should remove printk(">\n") in cpt. The code with comment is not removed, but commented out to remember that we have to return this, if the printk in refrigerator() is uncommented.

diff-cpt-asmlinkage

Patch from Alexey:
[CPT] asmlinkage attribute was forgotten This fixes CPT with CONFIG_REGPARAM compiled

diff-cpt-check-syscall-cap-20060814

Patch from Andrey:

This patch adds check for syscall cpu capability, which is needed by vsyscall page on x86_64 arch like sysenter capability which is already checked.

diff-cpt-check-vsyscall-20060817

Patch from Andrey:

This patch checks if 64-bit task in vsyscall now on x86_64 arch while suspend. If we noticed that task in vsyscall while suspend then we can try to suspend again. Check for vsyscall page on x86_64 in dump_one_mm() is removed.

diff-cpt-ifindex-renumber-20060815

Patch from Andrey:

This patch adds renumbering of netdev->ifindex'es on restore. We can do this because network is suspended. All manipulations are protected with rtnl_lock(). if the required index is already busy, then it swaps ifindex on the device in question and device which holds ifndx.

diff-cpt-lic-forkret-20060810-2

Patch from Dmitry:
fixed oops caused by diff-cpt-vzent-ovz-20060804, added modules license. Remove use of syscall_exit.

Bug #66511.

diff-cpt-mm-eagain-20060817

Patch from Andrey:
In tests we can see message: "mm_struct is referenced outside" After that message checkpoint fails.

It seems that this situation is legal, so checkpoint could be restarted. So we return -EAGAIN to be able to restart checkpoint.

diff-cpt-skb-pcount

Patch from Alexey:
[CPT] save/restore tcp_skb_pcount()

Backport from 2.6.16. 2.6.9 has this thing too.

diff-cpt-suid-dumpable

Patch from Alexey:
[CPT] restore mm->dumpable correctly

mm->dumpable is not boolean in >=2.6.9, but tri-state. Just save and restore raw value.

diff-cpt-test-caps-fix-20060815

Patch from Andrey:

This patch fixes old test capabilities code. We can't use context in this code, because it is not yet initialized. Was broken due to diff-cpt-checks-20060808

diff-cpt-ve-features-20060815

Patch from Andrey:

Feature set were not saved in CPT, so VPSes based on Suse template could fail after restore (VE_FEATURE_SYSFS was lost). Save feature set in place which were not used before (cpt_os_version and cpt_os_features fields in image header).

diff-cpt-vsyscall-page-20060814

Patch from Andrey:
Changes:

  • checks for errors are added
  • externs are moved to .h file
  • current_thread_info()->sysenter_return are set to right value on both arches

diff-cpt-x86_64-debuginfo

Patch from Alexey:
[CPT] fix compilation with CONFIG_DEBUG_INFO Just #undef it.

diff-ms-dcache-shrink-sb

Patch from Kirill:

Introduce per-sb list of dcache entries to improve shrink_dcache_sb() and shrink_dcache_parent(). This should eliminate customers problem when on VE stop umount takes an hour to complete while holding s_umount semaphore.

diff-ms-nf-ipt-compat-20060814

Patch from Dmitry:

remove extra checks from compat_copy_* functions. Previously lead to extra module put on error way.

Bug #66569.

diff-ms-retranscollapse

Patch from mainstream, prepared by Denis:
[TCP]: Do not try to collapse multi-packet SKB

Signed-off-by: David S. Miller <davem@davemloft.net>

diff-ms-smp-send-stop-irqs-fix-20060726

Patch from Pavel:

Do not rely on smp_call_function() to notify other cpus they must stop. Just call IPI after setting call_data accordingly.

smp_call_function() operates on global static call_data_stuct under lock to be sure it is valid during the call.

smp_send_stop() sends IPI w/o syncronisation with ones from smp_call_function(), but this is OK if handler will ACK booth of them.

Bug #65573.

diff-ms-vsyscall-page-20060814

Patch from Andrey:

Changes:

  • new entry sysenter_return is added to thread_info structure on x86_64 arch, ia32entry.S code changed accordingly
  • constants are changed to defined values
  • now we have a hole between IA32_STACK_TOP and vsyscall page
  • VSYSCALL32_SYSEXIT value must match SYSENTER_RETURN_OFFSET value to be able to migrate vsyscall-sysenter page from x86_64 to i386

Now we are able to migrate int80 and sysenter vsyscall pages from i386 to x86_64 and back.

diff-ms-vsyscall-sysenter-align-20060814

Patch from A. Mirkin:

>There is one unexpected place:
>>
>> #define VSYSCALL32_SYSEXIT (VSYSCALL32_BASE + 0x41A)
>>
>> If we cannot avoid this, I am afraid it would be better just
>> to add alignment to 0x420 in vsyscall-sysenter both in i386
>> and in ia32/x86_64 and to undo that code mimicing i386 mmap.
>>
>> If we need to know 0x41A explicitly, that trick loses sense completely.
>>
>> But this can be done later.
>>
>> Alexey

I have added necessary alignment in both archs and removed redundant code from x86_64 sysenter page. Now we have return offset at 0x420.

diff-ms-netlinkcb-1

Patch from mainstream for the netlink memory corruption:
>Bug #66596.

[NETLINK]: Fix sk_rmem_alloc assertion failure.

In netlink_dump we're operating on sk after dropping the cb lock. This is racy because the owner of the socket could close it after we drop the cb lock.

This is possible because netlink_dump isn't always called from the context of the process that owns the socket. For instance, if there is contention on rtnl then rtnetlink requests will be processed by the process that owns the rtnl.

The solution is to hold a ref count on the socket before we drop the cb lock.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

diff-cpt-x8664-setpriority

Patch from Alexey:
[CPT] process priority was restored incorrectly on x86_64

Ugly type casting bug. u32 was implicitly casted to long and on 64bit archs negative nice values were rejected as huge positive ones.

diff-ms-dcache-shrink-sb-fix

Patch from Pavel:
Newly added s_dentry_unused list must be initialized...

Bug #66944. Bug #66923.

diff-ve-net-dev-sysctl-20060821

Patch from Dmitry:
This patch allows VE owner to use net.ipv4.conf.<net_device>.* sysctls.

Bug #66842.

diff-ve-sysfs-root-20060818

Patch from Vasily Tarasov:
Fix of sysfs tree visibility in VPS.

sysfs_root variable must be virtualized, so that VPS see only class subsystem and class net.

Bug #66581.

diff-arch-4gb-mce-20060824

Patch by Vasily (vvs@):

This patch fixes 4Gb-split-related issue: access to kernel-space memory (machine_check_vector) before context switching

Bug #67271.

diff-cpt-restore-mnt-flags-20060831

Patch from Andrey:

Mount point's mnt_flags (noexec,nosuid,nodev) were omitted and not restored correctly. This patch should be applied with previous patch (diff-ms-bind-mount-flags-20060816), in other case we should do the following:

  1. Remove check for bind-mounts in do_remount() function
  2. Change procedure for restoring bind-mounts in next way:
        do_mount(bind);
        do_remount(mnt_flags).

diff-cpt-rst-dir

Patch from Alexey:
[CPT] do not keep open cwd while restore

>>From the viewpoint of CPT, cwd/root are very similar to an open file, it is just pair dentry/mnt. Normally, when opening some file we store it and its inode in special object cache to resolve opening of the same inode, when some of its aliases (dentries) are deleted.

But it is useless for directories, which cannot be hardlinked ever. And this consumes numfile UBC, so that restore can fail easily. So, do not store cwd/root file, unless it is deleted. This does not solve problem with restoring VE hitting numfiles, but relieves it a lot.

Now we can temporarily increase numfile limit while cpt/rst by 2 and everything should be OK.

Bug #62876.

diff-cpt-rst-sigdfl-20060830

Patch from Alexey:
[CPT] save/restore even SIG_DFL handlers

Linux has a funny feature: when SA_ONESHOT signal resets handler, flags are not set to default. And LTP tests verify this pathology.

diff-cpt-tcp-bind-bug-20060831

Patch from Alexey:
[CPT] tcp sockets were bind()ed incorrectly during restore

This case was totally missed. Fortunately, this happens rarely.

If checkpoint happens after some listening socket was closed, but it left behind some children (including timewait buckets), restore fails to bind them, unless the service used SO_REUSEADDR.

Stress checkpointing of LTP tests did not catch this earlier only because... I repaired the tests not to fail upon exhaustion of port space some time ago. Before that they failed with obvious and harmless diagnosis long before the first binding conflict happened.

diff-cpt-vsyscall-checks-20060817

Patch from Andrey:

This patch adds check for vsyscall cpu capabilities in compat mode on x86_64. We need to check it to be sure that migration of processes with vsyscall will be successful.

diff-ms-bind-mount-flags-20060816

Patch from Andrey:

This patch adds support of 3 mount flags (nodev, noexec, nosuid) to --bind mount. Now we can do bind mounts with noexec, nosuid and nodev options w/o need to do remount. This patch is also required for diff-cpt-restore-mnt-flags-20060831

diff-ms-emt64-entry-bad-iret

Patch from Andrey backported from 2 mainstream patches:

[PATCH] x86_64: Don't call do_exit with interrupts disabled after IRET exception

This caused a sigreturn with bad argument on a preemptible kernel to complain with

Debug: sleeping function called from invalid context
at /home/lsrc/quilt/linux/include/linux/rwsem.h:43
in_atomic():0, irqs_disabled():1

Call Trace: {__might_sleep+190} {profile_task_exit+21}
{__do_exit+34} {do_wait+0}

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

Git: 2391c4b594eb28abd58102de8f4e5d7a4fa39f4c

[PATCH] x86_64: Report SIGSEGV for IRET faults

tcsh is not happy with the -9999 error code.

Suggested by Ernie Petrides

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

Git: 3076a492a5e8dd624f237886646b35d12193502d

Bug #67257.

diff-ms-ext3-commit

Patch from mainstream:
[PATCH] jbd: fix BUG in journal_commit_transaction()

Fix possible assertion failure in journal_commit_transaction() on jh->b_next_transaction == NULL (when we are processing BJ_Forget list and buffer is not jbddirty).

!jbddirty buffers can be placed on BJ_Forget list for example by journal_forget() or by __dispose_buffer() - generally such buffer means that it has been freed by this transaction.

Freed buffers should not be reallocated until the transaction has committed (that's why we have the assertion there) but they *can* be reallocated when the transaction has already been committed to disk and we are just processing the BJ_Forget list (as soon as we remove b_committed_data from the bitmap bh, ext3 will be able to reallocate buffers freed by the committing transaction). So we have to also count with the case that the buffer has been reallocated and b_next_transaction has been already set.

And one more subtle point: it can happen that we manage to reallocate the buffer and also mark it jbddirty. Then we also add the freed buffer to the checkpoint list of the committing trasaction. But that should do no harm.

Non-jbddirty buffers should be filed to BJ_Reserved and not BJ_Metadata list. It can actually happen that we refile such buffers during the commit phase when we reallocate in the running transaction blocks deleted in committing transaction (and that can happen if the committing transaction already wrote all the data and is just cleaning up BJ_Forget list).

Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: "Stephen C. Tweedie" <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

GIT: 9ada7340987aa24395809570840c7c6847044f52

Bug #67362.

diff-ms-fib-info-leak-20060829

[PATCH] one more memory leak in fib_semantics This is the last patch of sequence of three patches curing memory leakages. This closes bug #67568.

This is mainstream bug specific for 2.6.9, the bug has been fixed:

commit b7656e7f2944984befa3ab99a5b99f99a23b302b
Author: David S. Miller <davem@davemloft.net>
Date:   Fri Aug 5 04:12:48 2005 -0700

[IPV4]: Fix memory leak during fib_info hash expansion.

When we grow the tables, we forget to free the olds ones
up.

Noticed by Yan Zheng.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff-ms-neigh-table-memleak

Patch from Alexey:
[PATCH] memory leak in neigh table destructor

It leaks one/two size-64 (size-128 on x86_64) per VE destruction. I do not see any more leaks in 2.6.16 on i386.

2.6.9 (or x86_64) still leaks a in size-64, size-128. Probably, in size-32.

diff-ms-pit-cpukhz

Patch from mainstream:
pit timer doesn't initialize cpu_khz

Bug #66955.

diff-ubc-net-wait-mem-fix-20060823

Patch from Pavel Emelianov <xemul@openvz.org>:

Return sk_stream_wait_memory() prototype to original state to make inifiniband driver (and any other caller) compile. Places that use new version call __sk_stream_wait_memory().

diff-ubc-oomwake-20060823

Patch from Denis:

This patch wakes up OOM killed process if it stucks in 5 second uninterruptible sleep in oom_kill

diff-ve-memleak-fib-hash-20060828

Patch from Alexey:
[PATCH] memory leakage in fib_hash

FIB hash tables and zone structs were never freed. Each time, when VE is stopped, they leak. All the kernels are affected.

It is surprising it was not detected earlier, it says something about quality of testing. Obviously, vzctl chkpnt/restore tests were never made, they bring a system with 4G of ram quite soon. Of course, vzctl start/stop is not so fast to bring down a system with decent amount of RAM, but hundreds of thousands of slab entries are still well visible.

The patch solves leakage in size-128 and most of leakage in size-64.

We still leak two objects in size-64 and 6 entries in size-32.

diff-ve-multi-cleanup-20060824

Patch from Pavel Emelianov:

Try to cleanup each VE in a separate thread. This alows simultaneous stop of many VEs at once

Bug #60673.

diff-ve-net-fib-leak-fix-20060830

Patch from Pavel Emelianov:
Fix memory leak in case of CONFIG_VE_NETDEV=n Do not create fib rules if we're not going to use them.

diff-ve-net-loop-stat-20060821

Patch from Dmitry:
Virtualized loopback_stats
Bug #66571.

diff-ve-net-mtu-20060828

Patch from Dmitry:

  • removed mtu restore logic for moved devices
  • added posibility to set mtu > 1500 for veth devices

Bug #66836.

diff-ve-portrange-b-20060829

Patch from Denis:
This patch fixes virtualization of ip_local_port_range sysctl

diff-ms-nf-compat-err-fix2-20060908-2

Patch from Dmitry (dim@), found by Vasiliy:
add flush of offsets on error way, may lead to table corruption on the next compat_do_replace.
Bug #65826.

diff-ms-nf-security-checks2-20060904

Patch from Dmitry:
A lot of changes in order to unify compat checks with regular ones. Fixed bugs with unavailabilty of some iptables targets and matches in 32bit VEs over 64bit kernels.
Bug #68017.
Bug #68042.
Bug #68043.

diff-cpt-clone-zombie-3

Patch from Alexey:
[CPT] restoring threads with tsk->fs==NULL, bug#65219

If a nptl thread is ptraced, it does not die immediately and we can arrive to the state:

  parent
      |
  main_thread    -----> thread1 [ptraced]
  in TASK_ZOMBIE        in TASK_ZOMBIE

To restore such configuration we do kernel_thread(CLONE_SIGNAL) in context of main_thread. But if it is exited, it has tsk->fs == NULL and kernel oopes.

Suggested fix is very simple: we just attach temporary fs_struct from init task of VE. Also, we have to delay initialization of tsk->group_exit, otherwise kernel will not allow us to clone.

This fix is pragmatic.

Better fix would be restructuring of restore to delay zombification until the last stage of restore. I.e. we could restore all the tree of alive processes with all the attributes of alive task (fs, mm etc). And after it is complete, we could make one more pass and collect garbage killing zombie tasks and clearing fs, mm etc. It would be cleaner and safer, but requires too much of changes.

Bug #65219.

diff-ve-sysfs-ptmx-20060907

Patch from Umka:
This patch adds /sys/class/tty/ptmx device. It's necessary 'cause otherwise udev doesn't create /dev/ptmx.

OpenVZ Bug #243.

diff-ve-nf-ipt-owner-20060907

Patch from Dmitry:
ipt_owner match is virtualized.

Bug #68090.

diff-ms-tifmemdie-20060907

Patch from Denis:

  • replaces PF_MEMDIE with TIF_MEMDIE
  • fixes OOM kill counter, which is required for correct OOM generation calculations

Bug #68248.

diff-ubc-oomdebug-20060907

Patch from Denis:
OOM generation/kill counter printing on OOM reports

diff-cpt-ns-to-jiffies

Patch from Alexey:
[CPT] arithmetic bug in _ns_to_jiffies().

Trivial.

But it took lots of time to find this. The only visible effect of this bug was so funny, that it is worth to describe. Sometimes, sshd (main daemon, which must never die) died after checkpointing.

sshd resets SIGALRM handler to SIG_DFL in signal handler. The bug resulted in incorrect calculation of it_real_incr and alarm was occasionally restarted. And that killed sshd.  :-)

diff-cpt-mlockall

Patch from Alexey:
[CPT] mlockall() prevents restore

If a program in new redhats ever did mlockall(), we have configurations with unreadable mlocked VMAs, which are not really in core. This is sort of a linux feature.

The reason is that mlock*() set VM_LOCKED even if they cannot bring in pages.

The fix is to ignore -EFAULT, returned by mlock().

diff-vzwdog-irq-b-20060905

Patch from Vasiliy:
/proc/interrupt file should be closed if kernel_thread() fails

diff-ms-modpost-unresolved-20060904

Patch from Andrey:
Unresolved symbols should abort build.

Bug #67875.

diff-vsched-boot-rollback-20060904

Patch from Andrey:
We need to rollback idle vcpu initialization if cpu initialization failed. In other case idle vcpu will be initialized in second time and we will get panic in init_idle().

Bug #67506.

diff-cpt-mod-refcnt-leak

Patch from Alexey:
[CPT] massive module refcnt leakage while restore Bug: detach of passed FDs is made directly in af_unix.c, bypassing skb destructor sometimes, so we leak module refcnt grabbed, when we attached our private destructor.

diff-cpt-resume-oops

Patch from Alexey:
[CPT] crash in cpt_resume().

Actually, it is known bug, which has been fixed in hurry and the fix did not cover all the places. Task can have tsk->sighand==NULL, if it is already released.

diff-emt64-better-calltraces

Patch from mainstream:
Make emt64 print more friendly call traces.

gnupatch@44a999d2XW45NdW2kIwsOgamozNOXA

diff-cpt-af-unix-deleted

Patch from Alexey:
[CPT] another bug in restoring deleted af_unix sockets

One case was missed. We assumed that if path_lookup() fails with -ENOENT, it means that we can bind to this name. But directory can be deleted!

So, instead, switch to attempt to bind() to name. And if it fails, bind() to temporary name instead.

diff-cpt-vsyscall-dump-20060911

Patch from Andrey:

Vsyscall object were not written correctly to image file thus output of imagedump utility were screwed up. Fixed.

diff-cpt-kill-freeze-clear

Patch from Alexey:
[CPT] clearing TIF_FREEZE was not removed

Code was a little messed up while splitting to two independent patches (diff-cpt-suspend-cleanup and diff-cpt-ve-suspend). As result TIF_FREEZE is still cleared in wake_ve(), which was main goal of diff-cpt-ve-suspend.

diff-ve-sysfs-ptmx-b-20060907

Patch from Dmitry:
virtualized simple_dev_list. Required for recently added tty_class virtualization.

Bug #68652.
Bug #68654.

diff-ve-nf-ipt-slab-20060927

Patch from Dmitry:
Fixed slab corruption on debug kernels

Bug #68880.

diff-ms-stopmachine-yield

Patch from mainstream:
[PATCH] Fix occasional stop_machine() lockup with > 2 CPUs

Stephen Rothwell noted a case where one CPU was sitting in userspace, one in stop_machine() waiting for everyone to enter stopmachine(). This can happen if migration occurs at exactly the wrong time with more than 2 CPUS.

Say we have 4 CPUS:

  1. stop_machine() on CPU 0creates stopmachine() threads for CPUS 1, 2 and 3, and yields waiting for them to migrate to their CPUs and ack.
  2. stopmachine(2) gets rebalanced (probably on exec) to CPU 1.
  3. stopmachine(2) calls set_cpus_allowed on CPU 1, sleeps awaiting migration thread.
  4. stopmachine(1) calls set_cpus_allowed on CPU 0, moves onto CPU1 and starts spinning.

Now the migration thread never runs, and we deadlock. The simplest solution is for stopmachine() to yield until they are all in place.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

diff-vzdq-isize-b-20060927

Patch from Denis ported from lunc@ code:
always fail dentry revalidate check for /proc quotafile

diff-vzdq-isize-20060913

Patch from Vasily (vtaras@) fixed by Denis:
This patch sets correct size on /proc/vz/aquota/*/aquota.*

Bug #59920.

diff-ms-remount-flags-20060912

Patch from Andrey:

In our kernel remounting of bind-mounts were prohibited. This patch changes that logic and now remounting of bind-mounts is prohibited if superblock flags are changed only.

diff-vzdq-symlink-20060913

Patch from Denis:
This patch restores quota symlink to /proc on vz_quota_on corrupted by quotacheck.

Bug #66949.

diff-dbg-stop-machine-20060922

Patch from Pavel:

This patch tracks info about all tasks involved in "stop machine" procedure.

Should help to fixb ug #68813 and probably bug #67369.

diff-ve-veth-perf-20060926

Patch from Denis:
TX for veth device do not require device queue locking

diff-debug-busy-inodes-b-20060914

Patch from Dmitry:
fixed printed debug info in case of "busy inodes"

Bug #68575.

diff-venet-perf-20060925

Patch from Denis:
TX for venet device does not require device queue locking

diff-ve-vpid-init-20060322

Patch from Vasily:
This patch removes removes VE init pid + 1024 (virtual init pid). Its presence is detected by the chkrootkit and Maik blames us for this.

Bug #68754.

diff-ve-neightable-warn-20060920

Patch from Denis:

The problem:

  • VE should have exactly one ARP entry
  • it is allocated from UBC slab and the allocation is failed due to UBC
  • EBUFS is returned to the caller

The cure: -ENOMEM in such a case.

Bug #65836.

diff-ve-veth-proc-20060919

Patch from Andrey:

It is a bug that veth proc entry (/proc/vz/veth) exists in VE0 and VPS. This patch fixes this.

OpenVZ Bug #271.

diff-ve-nf-ct-proc-20060919

Patch from Dmitry:
Fix /proc entries for conntracks.

OpenVZ Bug #267.

diff-ubc-oomkill-fixes-20060918

Patch from Pavel:
This patch fixes locking and ub refcounting in oom killer.

  • oom_kill() drops oom_generation_lock, so after returning from it no need to do it again;
  • if no bad processes were found in selected ub then ub must be put-ed;
  • comment about locking before oom_select_and_kill_sc.

Bug #68721.

diff-ms-nf-compat-err-fix-20060907

Patch from Dmitry:
This patch fixed translate_compat_table() error way.

Bug #68286.

diff-ubc-magic-checks-20061011

Patch from Pavel:

When ub's BUG on bad page's ub/pb happens it's hard to find out what has hapened w/o some memory dumps.

This patch makes such a dumps and doesn't BUG the machine. Instead page_ub(page) is set to NULL in case of error in kmem accounting. For page beancounters all pbcs that refer to bad page are removed.

This patch can help to solve Bug #70105 and some others...

  • don't free page beancounters
  • print some additional page info (taken from bad_page())
  • do grace recovery

diff-ms-xmit-bh-20061009

Patch from Konstantin Khorenko/mainstream:
[NET]: Fix unbalanced local_bh_enable() in dev_queue_xmit()

Many thanks to Maik Broemme who helped debugging this.

gnupatch@4186e5bfgUOMBbA6xFaY0_z84kaURw
cset@1.1938.295.30

Bug #70107.

diff-ms-fs-preparewrite-eh-20061005

Patch from Vasiliy/mainstream:

CVE-2006-4813: Information leak in __block_prepare_write() Dmitriy Monakhov from SWsoft Virtuozzo/OpenVZ Linux Team has noticed an information leak in __block_prepare_write() which affects RHEL4 kernels: __block_prepare_write() does not clear properly the data buffers during error recovery and therefore content of previously unliked files is accessible.

It is known issue and it is fixed in mainstream by following patch:
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=152becd26e0563aefdbc4fd1fe491928efe92d1f
RedHat Bug #207463
Bug #69778.

diff-cpt-check-vsyscall-20061004

Patch from Andrey:
This patch adds checking for vsyscall presence on i386 arch.

Also we demand syscall capability only for 64-bit processes.

diff-ve-veth-in-ve0-20061009

Patch from Andrey Mirkin:
This patch enables creation of veth pair in VE0.

  • ve0.is_running is initialized to 1
  • ve0.op_sem is initialized in vecalls_init()

diff-vzdq-noquota-20061004

Patch from Alexey:
[VZQUOTA] add virtinfo notifier call to disable vzquota on an inode

Comparing to previous version:

  1. vzquota_dentry_off() is replaced with vzquota_inode_off()
  2. all the operations with S_NOQUOTA are moved under inode_qmblk_lock(). A few of exceptions (in standard dquot.h header) rely on the fact, that S_NOQUOTA is never cleared.
  3. Two patches are merged together, because S_NOQUOTA handling essentially trivialized.

diff-cpt-execve-20061006

Patch from Andrey:

  • replaced execve call with function like in 2.6.16 kernel.
  • we should check return code of execve to be -ENOENT, not ENOENT.

diff-ms-dcahe-aliasing-20061009

Patch from Dmitry Monakhov:
A couple of flush_dcache_page()s are missing on the I/O-error paths.

Acked-By: David Miller
committed in -mm: d-cache-aliasing-issue-in-__block_prepare_write.patch

diff-cpt-vsyscall-checks-b-20061004

Patch from Andrey:
We should check if task->mm is not NULL before checking task->mm->context.vdso value.

Bug #69680.

diff-ms-qdisc-lookup-sync-20061004

Patch from Denis(den@) and Vasily:
[PATCH] add synchronization while lookup qdiscs.

this patch is a part of patch@1.1938.331.16

OpenVZ Bug #278.

diff-ve-netlink-perm-20061004

Patch from Dmitry (dim@) and Vasily:
cap_netlink_recv should check for both CAP_NET_ADMIN and CAP_VE_NET_ADMIN. Now zebra in VE0 under std user should work.

http://forum.openvz.org/index.php?t=tree&goto=6283&#msg_6283

diff-ms-ext2-errorbehaviour-20061006

Patch from Vasiliy:
EXT2_ERRORS_CONTINUE should be read from the superblock as default value for error behaviour.

parse_option() should clean the alternative options and should not change default value taken from the superblock.

Signed-off-by: Vasily Averin <vvs@sw.ru>
Acked-by: Kirill Korotaev <dev@openvz.org>

diff-ms-ext3-errorbehaviour-b-20061006

Patch from Dmitry Mishin:

EXT3_ERRORS_CONTINUE should be taken from the superblock as default value for error behaviour.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Acked-by: Vasily Averin <vvs@sw.ru>
Acked-by: Kirill Korotaev <dev@openvz.org>

diff-ms-ext3-errorbehaviour-20060902

Patch from Vasiliy:
SWsoft Virtuozzo/OpenVZ Linux kernel team has discovered that ext3 error behavior was broken in linux kernels since 2.5.x versions by the following patch:

2002/10/31 02:15:26-05:00 tytso@snap.thunk.org
Default mount options from superblock for ext2/3 filesystems

gnupatch@3dc0d88eKbV9ivV4ptRNM8fBuA3JBQ

In case ext3 file system is mounted with errors=continue (EXT3_ERRORS_CONTINUE) errors should be ignored when possible. However at present in case of any error kernel aborts journal and remounts filesystem to read-only. Such behavior was hit number of times and noted to differ from that of 2.4.x kernels.

This patch fixes this:

  • do nothing in case of EXT3_ERRORS_CONTINUE,
  • set EXT3_MOUNT_ABORT and call journal_abort() in all other cases
  • panic() should be called after ext3_commit_super() to save sb marked as EXT3_ERROR_FS

Signed-off-by: Vasily Averin <vvs@sw.ru>
Acked-by: Kirill Korotaev <dev@sw.ru>

Bug #57259.
Bug #67988.

diff-ms-net-bridge-20061004

Patch from Andrey, backported from mainstream:
[BRIDGE]: Fix deadlock in br_stp_disable_bridge

Looks like somebody forgot to use the _bh spin_lock variant. We ran into a deadlock where br->hello_timer expired while br_stp_disable_br() walked br->port_list.

Signed-off-by: Adrian Drzewiecki <z@drze.net>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Git: 78872ccb68335b14f0d1ac7338ecfcbf1cba1df4

Bug #69666.