From OpenVZ Virtuozzo Containers Wiki
Jump to: navigation, search



  • Major x8664 security fix (CVE-2007-4573)
  • Other security fixes
  • Rebase to RHEL5 8.1.14 kernel
  • areca and DRBD driver updates
  • Fixes for NFS client in VE, CPT, UBC, VZDQ, IPv6, fairsched, 4GB split.

Config changes







Patch from Alexey Kuznetsov (alexey@):

It is I2O device, which siffers of the same problem, which I found with nbd.

The fact that this is troggered even with anticipatory scheduler says that this scheduler is broken as well, only race window is more narrow.

if (ad->changed_batch && ad->nr_dispatched == 1) {
        ad->changed_batch = 0;
        if (ad->batch_data_dir == REQ_SYNC)
                ad->new_batch = 1;

I guess that it freezes when we hit this place with changed_batch==0.


Patch from Andrey Mirkin <>
[PATCH] CPT: correct handling of lock fd error codes

Undump in CPT is performed in 2 stages (create env and init process; full undump). These 2 stages are separated with help of pipe: when this pipe is closed from vzctl we can proceed with second stage in CPT. So, possible scenario is following:

1. We are waiting in CPT in pipe_read() when pipe will be closed.

2. someone is sending signal to our task which is waiting in pipe_read().

3. pipe_read() exits with -ERESTARTSYS, but we ignore this error and continue with undump process, but vzctl do not perform all intermediate stages, so we can't proceed with undump.

Bug #88618.


Patch from Denis Lunev <>

Additional debug for busy inodes after umount: print more dentry information


Patch from Jay Vosburgh <>
[PATCH] bonding: Fix 802.3ad no carrier on "no partner found" instance

Modify carrier state determination for 802.3ad mode to comply with section 43.3.9 of IEEE 802.3, which requires that "Links that are not successful candidates for aggregation (e.g., links that are attached to other devices that cannot perform aggregation or links that have been manually configured to be non-aggregatable) are enabled to operate as individual IEEE 802.3 links."

Bug reported by Laurent Chavey <>. This patch is an updated version of his patch that changes the wording of commentary and adds an update to the driver version.

Signed-off-by: Jay Vosburgh <>
Signed-off-by: Laurent Chavey <>
Signed-off-by: Jeff Garzik <>

GIT: 031ae4deb095a1f18a842740459c5ae184ec931c

OpenVZ Bug #666


Patch from Pavel Emelianov <>
[PATCH proc: return ENOENT instead of EACCESS when task is dead

When reading the symlink /proc/<pid>/exe or /proc/<pid>/fd/<any> of a task, that has managed to die after opening the appropriate dir, but before reading the symlink, kernel returns -EACCESS due to strange code in proc_fd_access_allowed().

Unlike the ms/RHEL5 kernel, the SuSE kernel returns -ENOENT in this case and it turned out, that some SuSE software (inetd) is aware of it and cannot stand any deviations.

Make the kernel return -ENOENT when the task is dead to make VEs based on SuSE templates work. Keep the return value in any other case (-EACCESS).

Bug #82009.


Patch from Kirill Korotaev <>
[PATCH] Reset current->pdeath_signal on SUID binary execution (CVE-2007-3848)

Severity: minor

This fixes a vulnerability in the "parent process death signal" implementation discoverd by Wojciech Purczynski of COSEINC PTE Ltd. and iSEC Security Research.

Signed-off-by: Marcel Holtmann <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>

X-Git-Tag: v2.6.22.4~1
X-Git-Url: c27a3393808acab7243da7455c713fe763ea2627


Patch from Oleg Nesterov <>
[PATCH] sigqueue_free: fix the race with collect_signal()

Spotted by taoyue <> and Jeremy Katz <>.

collect_signal:                         sigqueue_free:

                                                if (!list_empty(&q->list)) {
                                                        // not taken
                                                q->flags &= ~SIGQUEUE_PREALLOC;

        __sigqueue_free(first);                 __sigqueue_free(q);

Now, __sigqueue_free() is called twice on the same "struct sigqueue" with the obviously bad implications.

In particular, this double free breaks the array_cache->avail logic, so the same sigqueue could be "allocated" twice, and the bug can manifest itself via the "impossible" BUG_ON(!SIGQUEUE_PREALLOC) in sigqueue_free/send_sigqueue.

Hopefully this can explain these mysterious bug-reports, see

Alexey Dobriyan reports this patch makes the difference for the testcase, but nobody has an access to the application which opened the problems originally.

Also, this patch removes tasklist lock/unlock, ->siglock is enough.

Signed-off-by: Oleg Nesterov <>
Cc: taoyue <>
Cc: Jeremy Katz <>
Cc: Sukadev Bhattiprolu <>
Cc: Alexey Dobriyan <>
Cc: Ingo Molnar <>
Cc: Thomas Gleixner <>
Cc: Roland McGrath <>
Cc: <>
Signed-off-by: Andrew Morton <>


Patch from Kirill Korotaev <>
[PATCH] UBC: add CFQ ioprio exports for CONFIG_IOSCHED_CFQ=m case

Add CFQ ioprio exports for CONFIG_IOSCHED_CFQ=m case

OpenVZ Bug #669


Patch from Denis Lunev <>
[PATCH] UBC: missed wakup on one ub refill path

The following scenario is possible:

  • TCPSNDBUF rejected by ub_sock_get_wreserv
  • sys_poll -> ub_sock_snd_queue_add
  • uncharge -> sk->sk_write_space DOES NOT wakeup the waiting poll as the queue is too long

After this, no one will wakeup the process :(

And it will block till the poll timeout end.

The patch makes sure, that the generic code will send wakeup when appropriate.

Bug #89127


Patch from Denis Lunev <>
This patch fixes NFS client socket transport creation.

This patch fixes NFS client socket transport creation. RPC client is cached in NFS client structure. So, correct allocation/lookup for NFS client.


Patch from Denis Lunev <>

This patch allows VE to traverse mountpoints if exported NFS tree contains ones.


Patch from Vitaliy Gusev <>
[PATCH] VENET: allow rmmod even if VE0 venet is UP

This patch allows delete module even if venet ifterface in VE0 is up. Note, we must shutdown all interfaces in others VEs before delete module still.

Bug #83537


Patch from DRBD, prepared by Evgeniy Kravtsunov:
Update drbd from 8.0.4 to 8.0.5.


Patch from Vitaliy Gusev <>
[PATCH] BC: fix unaliagned access on ia64

struct page contains union of the fields:

        union {
                struct user_beancounter *page_ub;
                struct page_beancounter *page_pb;
        } bc;

and there are three cases for value 'bc':

1) pointer to user_beancounter
2) pointer to page_beancounter
3) IO marked pointer to page_beancounter

This patch corrects access to 3-rd case pointer.

Bug #86554.


Patch from Pavel Emelianov <>
[PATCH] IPv6: lost exec env reset in number of places

On is in for() loop - we could got to next loop iteration and make the original env lost.

The other two are in error paths.


Bug #88675.

From Andrey Mirkin, backported from mainstream:

The following patches are ported from mainstream to fix deadlock in dlm The problem was in this path:

    dlm_clear_proc_locks   <<<  Here we are taking mutex ls_clear_proc_locks
              dlm_user_add_ast <<< Here we are trying to take the same mutex

Following patches changes lock/unlock mechanism and fixes this and other deadlocks which are present in rhel5 kernel.

Tested on 2.6.18-rhel5-041.1 kernel.

diff-tree a1bc86e6bddd34362ca08a3a4d898eb4b5c15215 (from 1d6e8131cf0064ef5ab5f3411a82b800afbfadee)
Author: David Teigland <>
Date:   Mon Jan 15 10:34:52 2007 -0600

    [DLM] fix user unlocking

    When a user process exits, we clear all the locks it holds.  There is a
    problem, though, with locks that the process had begun unlocking before it
    exited.  We couldn't find the lkb's that were in the process of being
    unlocked remotely, to flag that they are DEAD.  To solve this, we move
    lkb's being unlocked onto a new list in the per-process structure that
    tracks what locks the process is holding.  We can then go through this
    list to flag the necessary lkb's when clearing locks for a process when it

    Signed-off-by: David Teigland <>
    Signed-off-by: Steven Whitehouse <>


Patch from Andrey (amirkin@) ported from mainstream:


Author: David Teigland <>
Date:   Wed Jan 24 10:21:33 2007 -0600

    [DLM] can miss clearing resend flag

    A long, complicated sequence of events, beginning with the RESEND flag not
    being cleared on an lkb, can result in an unlock never completing.

    - lkb on waiters list for remote lookup
    - the remote node is both the dir node and the master node, so
      it optimizes the lookup into a request and sends a request
      reply back
    - the request reply is saved on the requestqueue to be processed
      after recovery
    - recovery runs dlm_recover_waiters_pre() which sets RESEND flag
      so the lookup will be resent after recovery
    - end of recovery: process_requestqueue takes saved request reply
      which removes the lkb off the waitesr list, _without_ clearing
      the RESEND flag
    - end of recovery: dlm_recover_waiters_post() doesn't do anything
      with the now completed lookup lkb (would usually clear RESEND)
    - later, the node unmounts, unlocks this lkb that still has RESEND
      flag set
    - the lkb is on the waiters list again, now for unlock, when recovery
      occurs, dlm_recover_waiters_pre() shows the lkb for unlock with RESEND
      set, doesn't do anything since the master still exists
    - end of recovery: dlm_recover_waiters_post() takes this lkb off
      the waiters list because it has the RESEND flag set, then reports
      an error because unlocks are never supposed to be handled in
    - later, the unlock reply is received, doesn't find the lkb on
      the waiters list because recover_waiters_post() has wrongly
      removed it.
    - the unlock operation has been lost, and we're left with a
      stray granted lock
    - unmount spins waiting for the unlock to complete

    The visible evidence of this problem will be a node where gfs umount is
    spinning, the dlm waiters list will be empty, and the dlm locks list will
    show a granted lock.

    The fix is simply to clear the RESEND flag when taking an lkb off the
    waiters list.

    Signed-off-by: David Teigland <>
    Signed-off-by: Steven Whitehouse <>

Bug #88675


Patch from Andrey (amirkin@) ported from mainstream:


Author: David Teigland <>
Date: Wed Mar 28 09:56:46 2007 -0500

    [DLM] overlapping cancel and unlock

    Full cancel and force-unlock support.  In the past, cancel and
    force-unlock wouldn't work if there was another operation in progress on
    the lock.  Now, both cancel and unlock-force can overlap an operation on
    a lock, meaning there may be 2 or 3 operations in progress on a lock in
    parallel.  This support is important not only because cancel and
    force-unlock are explicit operations that an app can use, but both are
    used implicitly when a process exits while holding locks.

    Summary of changes:

    - add-to and remove-from waiters functions were rewritten to handle
      situations with more than one remote operation outstanding on a lock
    - validate_unlock_args detects when an overlapping cancel/unlock-force
      can be sent and when it needs to be delayed until a request/lookup
      reply is received
    - processing request/lookup replies detects when cancel/unlock-force
      occured during the op, and carries out the delayed cancel/unlock-force
    - manipulation of the "waiters" (remote operation) state of a lock moved
      under the standard rsb mutex that protects all the other lock state
    - the two recovery routines related to locks on the waiters list changed
      according to the way lkb's are now locked before accessing waiters state
    - waiters recovery detects when lkb's being recovered have overlapping
      cancel/unlock-force, and may not recover such locks
    - revert_lock (cancel) returns a value to distinguish cases where it did
      nothing vs cases where it actually did a cancel; the cancel completion
      ast should only be done when cancel did something
    - orphaned locks put on new list so they can be found later for purging
    - cancel must be called on a lock when making it an orphan
    - flag user locks (ENDOFLIFE) at the end of their useful life (to the
      application) so we can return an error for any further
    - we weren't setting COMP/BAST ast flags if one was already set, so we'd
      lose either a completion or blocking ast
    - clear an unread bast on a lock that's become unlocked

    Signed-off-by: David Teigland <>
    Signed-off-by: Steven Whitehouse <>

Bug #88675


Patch prepared by Kostja (khorenko@):
Areca driver v1.20.0X.14.devel provided to Thomas Krenn AG by Areca people.

Areca driver v1.20.0X.14.devel provided to Thomas Krenn AG by Areca people. Declared to fix the memory leak problem caused by Areca command line utility (about 7 MB per execution of the CLI according to Thomas Krenn AG).

P.S. the issue with ARCMSR_MAX_XFER_SECTORS[_B] fixed.

Bug #87569.


Patch from Kirill Korotaev <>
[PATCH] fix possible page leak during LDT allocation

Fix possible page leak during LDT allocation: alloc_ldt() doesn't free new pages on error path and doesn't change context->size value. Thus we can have more pages on destroy_context() than we think we have according to our context->size. Let's scan all 16 pages to make sure everything is freed.


Patch from Andrey Mirkin <>
[PATCH] CPT: lock fd close correct error handling

Lock fd close correct error handling: on undump local variable 'err' was used to store an error, that is why this error was ignored and undump continues in spite of the error.


Patch from Kirill Korotaev <>
[PATCH] CPT: fix LDT pages leak with 4GB split

In case 4GB split kernel is used, CPT can leak some of LDT pages - it allocates pages first, but doesn't set context->size, thus destroy_context() won't try to free these additional LDT pages.

Relevant for -ent kernel flavors only.


Patch from Andrey Mirkin <>
[PATCH] Fix CPT vsyscall part for x8664 case


Patch from Denis Lunev <>
[PATCH debug: additional debug for busy inodes after umount (part 2)

print also all mnt points for given super block on umount


Patch from Kirill Korotaev <>
[PATCH] fairsched: fix warning on preempt kernels

rq->curr should be initialized to something to avoid its dereference e.g. in try_to_wakeup() on first process wakeup.

Actually it doesn't matter to what it is initialized. Let's use init_task for initial rq->curr.


Patch from Denis Lunev <>
[PATCH] cond_resched_lock() doesn't work in 2.6.18

When CONFIG_PREEMPT=n, cond_resched_lock() and cond_resched_softirq() don't work, since they check for preempt_count to be sane, but this counter is not tracked w/o preemption and is always 0.

So the fix is:

  • ignore preempt count when CONFIG_PREEMPT=n
  • plus if we want to check preempt_count in CONFIG_PREEMPT=y case (just to be on the safe side), we need to account lock_kernel() effect on preempt_count correctly.

Bug #91012


Patch from Denis Lunev <>
[PATCH] small cleanup for fib rules

This patch slightly cleanups FIB rules framework. rules_list as a pointer on struct fib_rules_ops is useless. It is always assigned with a static per/subsystem list in IPv4, IPv6 and DecNet.

Signed-off-by: Denis V. Lunev <>
Acked-by: Alexey Kuznetsov <>


Patch from Alexandr Andreev <>
[FS]: disable O_DIRECT by default inside VE

We still have to disable O_DIRECT by default inside VE due to compatibility with old broken software (e.g. rpm)

Bug #91550.


Patch from Denis Lunev <>
[PATCH] improve shrink_dcache_sb()

This patch makes shrink_dcache_sb consistent with dentry pruning policy.

On the first pass we iterate over dentry unused list and prepare some dentries for removal. However, since the existing code moves evicted dentries to the beginning of the LRU it can happen that fresh dentries from other superblocks will be inserted *before* our dentries.

This can result in significant slowdown of shrink_dcache_sb(). Moreover, for virtual filesystems like unionfs which can call dput() during dentries kill existing code results in O(n2) complexity.

We observed 2 minutes shrink_dcache_sb() with only 35000 dentries.

To avoid this effects we propose to isolate sb dentries at the end of LRU list.

Signed-off-by: Denis V. Lunev <>
Signed-off-by: Kirill Korotaev <>
Signed-off-by: Andrey Mirkin <>


Patch from David S. Miller <>
[IPV6]: /proc/net/anycast6 unbalanced inet6_dev refcnt

Reading /proc/net/anycast6 when there is no anycast address on an interface results in an ever-increasing inet6_dev reference count, as well as a reference to the netdevice you can't get rid of.

Signed-off-by: David S. Miller <>

Bug #75822

X-Git-Tag: v2.6.21-rc3~1492~5
X-Git-Url: aa6e4a96e7589948fe770744f7bb4f0f743dddaa


Patch from Roland Dreier <>
[ATM] he: Fix __init/__devinit conflict

he_init_one() is declared __devinit, but calls lots of init functions that are marked __init. However, if CONFIG_HOTPLUG is enabled, __devinit functions go into normal .text, which leads to

    WARNING: drivers/atm/he.o - Section mismatch: reference to .init.text: from .text between 'he_start' (at offset 0x2130) and 'he_service_tbrq'

Fix this by changing the __init functions to __devinit.

Signed-off-by: Roland Dreier <>
Signed-off-by: David S. Miller <>

X-Git-Tag: v2.6.19-rc1~1232^2~10
X-Git-Url: 5b7c714ec27584b18279b741b6043016f8adb9de

Found by while building ovzkernel-2.6.18-8.1.8.el5.028stab039.1 on ppc64


Patch from Eugene Teo <>
[PATCH] syscall invalid validation x86_64 (CVE-2007-4573)

This patch fixes a vulnerability discovered by Wojciech Purczynski. It appears that the 64-bit values stored in the %rax register is not properly validated. This may lead to an out-of-bounds system call table access resulting in the ability to execute arbitrary code in the context of the kernel on x86_64 platform.


Signed-off-by: Eugene Teo <>


Patch from Alexey Dobriyan <>
[PATCH] Fix needless SysRq help message

Every time one does

        echo p >/proc/sysrq-trigger

newline sneaks into kernel buffer, sysrq code doesn't find it in handlers table, and spits help banner.


Patch from Alexandr Andreev <>
This patch lets you to change the SysRq key in Alt+SysRq+XXX combination

This patch lets you to change the SysRq key in Alt+SysRq+XXX combination with any other key:

You can get scancodes of your keyboard with programs like showkey or evtest. The default Alt+SysRq combination still works after redefinition.


Patch from Denis Lunev <>
[PATCH] BC: set correct ub context in netlink processing

  • rtnl netlink socket is asynchronous and can be processed during rtnl_unlock in the other context.
  • rtnl netlink socket is used to create kernel objects
  • these objects are planned to be accounted at least to UB_KMEMSIZE

So, let's set correct UB context for packets processing.


Patch from Denis Lunev <>
[PATCH] virtualize fib rules, add kmemsize accounting

  • this patch virtualizes IPv6 routing rules
  • fixes IPv4 routing rules implementation
  • adds UBC accounting to kmemsize for fibrules

Bug #90085.


Patch from Vitaliy Gusev <>
[PATCH] net: allow SIOCSIFFLAGS in dev_ioctl()

This patch allows ioctl SIOCSIFFLAGS from VE on PF_INET6 socket.

In old VEs (with redhat-6.2) ifconfig calls ioctl() on IPv6 socket, while tries to make venet UP inside VE. And fails. Since this ioctl is prohibited in dev_ioctl(). Newer ifconfig's call this ioctl() on IPv4 socket and thus end up with inet_ioctl().

Bug #91248


Patch from Pavel Emelianov <>
[PATCH] clenup: remove unused vars from init_ve_system()


Patch from Kirill Korotaev <>
Fix vzevent module. It is incompatible with kobject uevents in reallity.

Fix vzevent module. It is incompatible with kobject uevents in reallity. Current code does *nothing*, since kobj is not fully configured. Instead let's send messages via a separate netlink channel.


Patch from Alexandr Andreev <>
[PATCH] VZDQ: add force quota off option

Add force quota off option: Just return 0 instead of EBUSY in case of VZQUOTA_OFF_FORCED ioctl.


Patch from Alexandr Andreev <>
[PATCH] VZDQ: report busy dentries on quota off

In case of vzquota off fail, find and pass to the userspace information about busy dentries. vzquota must pass PAGE_SIZE buffer, and kernel fill it with found filenames.

- use free_page() instead of kfree()
- remove unnecessary \n after the last file name

- use generic __d_path()
- don't call copy_to_user() if both ubuf and buf == NULL

- remove VZ_DQ_OFF_FORCED declaration from header, it relates to another

Bug #86944.


Patch from Alexandr Andreev <>
[PATCH] VZDQ: report busy dentries on vzquota on

In case of vzquota ON fail, find and pass to the userspace information about busy dentries. vzquota must pass user buffer, and kernel fill it with found filenames.


Patch from Andrey Mirkin <>
[PATCH] Xen specific changes to support vsyscall in CPT.

Tested on x86_64 and i386 on Xen and non-Xen kernels with enabled/disabled vsyscall.


Patch from Dmitry Monakhov <>
[PATCH] sysctl: add lsyscalls sysctl

This patch introduce /proc/sys/fs/lsyscall_enable sysctl. Sysctl instoduced mostly for testing purposes.


Patch from Evgeniy (
[PATCH 2.6.18] drbd update 8.0.5 - 8.0.6


Patch from Denis Lunev <>
This patch fixes allocation size for default IPv6 FIB rules.

This patch fixes allocation size for default IPv6 FIB rules. These hunks were accidentally missed in the previous patch.

Bug #92085.


Patch from Kirill Korotaev <>
[PATCH] fix for cond_resched() fix: remove wrong WARN_ON(1)

WARN_ON(1) is illegal, since when we return from cond_resched()->schedule() we have preempt_count = PREEMPT_ACTIVE, and if we have still current->need_resched flag set we can get to cond_resched() again from schedule()->reacquire_kernel_lock()->cond_resched() and thus this WARN_ON(1) gets triggered.

Bug #92140.


Patch from Kirill Korotaev <>
[PATCH] another cond_resched() fix

cond_resched() should check that it is not nexted via preempt_count() & PREEMPT_ACTIVE flag. Drop the whole super-logic from Den checking preempt count at al.

Bug #92140.