From OpenVZ Virtuozzo Containers Wiki
Jump to: navigation, search



  • Driver updates from RHEL4u2/official sites to make OpenVZ conform to official HCL
  • Mainstream security fixes
  • Fixes for EMT64/ia64 compilation
  • Small VPS/UBC fixes


The same as 022stab044.1 plus:


Driver updates

  • aacraid v1.1.5 (site)
  • aoe v14 (site)
  • e1000 v6.0.54 (site)
  • e100 v3.4.8 (site)
  • emulex v8.0.16.17 (site)
  • iscsi-sfnet v4. (rhel4u2)
  • megaraid v2.20.x (site)
  • qla4xx v5.00.02 (site)
  • r8169 v2.2 (site)
  • sk98lin v8.24.1.3 (site)
  • snapapi v0.6.7 (site)
  • tg3 v3.27 (rhel4u)

Other updates

  • scsi midlayer (rhel4u2)
  • ide csb6-raid support (rhel4u2)
  • intel ich7 and esb2 support (rhel4u2)
  • libata v1.11 (rhel4u2)



Patch from Pavel:

Add set_exec_env(get_ve0()) and back in emt64's smp_apic_timer_interrupt() call.


Patch from Pavel:

Added set_exec_env(get_ve0()) and back in handling of irq in ia64.


Patch from Pavel:

Added set_exec_env(get_ve0()) and back in do_IRQ for x86_64 arch.


Patch from Pavel:

Added ub0 execub context in ia64 irq handling.


Patch from Pavel:

Added ub0 execub context in irq handling on x86_64.


Patch from mainstream:

x86 sysenter: clear %ebp on exit.

It contains the thread info pointer. That's not something that user mode can really use for anything interesting, but it's also not something that user mode should ever really see.

Pointed out by Brad Spender as being in PaX.


Patch from mainstream:

[PATCH] Fix LDT/TSS limit on x86-64

Paul Menage pointed out that the previous change for the LDT/TSS limit on x86-64 was incorrect. This could cause the user to corrupt memory beyond the LDT. This patch implements the fix suggested by Paul.


Patch from Andrey Mirkin, modified by Kirill:

This patch adds print of pgdat info in vzwdog.


Patch from mainstream:

[IA64] Sanity check unw_unwind_to_user

The unw_unwind_to_user function in unwind.c on Itanium (ia64) architectures in Linux kernel 2.6 allows local users to cause a denial of service (system crash).

Signed-off-by: Keith Owens <>
Signed-off-by: Tony Luck <>


Patch from mainstream, ported by Pavel:

A flaw affecting the auditing code was discovered. On Itanium architectures a local user could use this flaw to cause a denial of service (crash). This issue is rated as having important security impact (CAN-2005-0136).


Patch from mainstream:

[IA64] speedup ptrace by avoiding kernel-stack walk

This patch changes the syscall entry path to store the current-frame-mask (CFM) in pt_regs->cr_ifs. This just takes one extra instruction (a "dep" to clear the bits other than 0-37) and is free in terms of cycles.

The advantage of doing this is that it lets ptrace() avoid having to walk the stack to determine the end of the user-level backing-store of a process which is in the middle of a system-call. Since this is what strace does all the time, this speeds up strace quite a bit (by ~50%). More importantly, it makes the syscall vs. non-syscall case much more symmetric, which is always something I wanted.

Note that the change to ivt.S looks big but this is just a rippling effect of instruction-scheduling to keep syscall latency the same. All that's really going on there is that instead of storing 0 into cr_ifs member we store the low 38 bits of ar.pfs.

Signed-off-by: David Mosberger <>

Signed-off-by: Tony Luck <>


Patch from mainstream:

[IA64] add cpu_relax() in the body of spin loops

This patch adds cpu_relax() in the body of spin loops in smp_call_function(), smp_call_function_single(), and ia64_mca_wakeup_ipi_wait().

Signed-off-by: Fenghua Yu <>

Signed-off-by: Tony Luck <>


Patch from Kostja:

Hack to force 80x25 video mode on boot, when framebuffer is not configured.


Patch from mainsream/RHEL4u2:

Add pci quircks for intel E7320_MCH and E7525_MCH to disable irq balancing.

As part of the workaround for the "Interrupt message re-ordering across hub interface" errata (page #16 in, BIOS may enable hardware IRQ balancing for E7520/E7320/E7525(revision ID 0x9 and below) based platforms.

Add pci quirks to disable SW irqbalance/affinity on those platforms. Move balanced_irq_init() to late_initcall so that kirqd will be started after pci quirks.


Patch from mainstream:

invalidate_inode_pages() and invalidate_inode_pages2() can mark pages not uptodate while read() is trying to read from them. This is interpreted as an I/O error.

Fix that by teaching the invalidate code to leave the page alone if someone else has a ref on it.


Patch from Pavel, modified by Kirill:

Fixup of race between page_beancounting removing and checking page for having valid pbc.
Bug 52609.


Patch from Andrey:

This patch fixes VE tasks find/travers in ia64 perfomance monitor


Patch from mainstream:

[PATCH] kjournald: missing JFS_UNMOUNT check

It seems that kjournald() may be missing a check of the JFS_UNMOUNT flag before calling schedule(). This showed up in testing of OCFS2 recovery where our recovery thread would hang in journal_kill_thread() called from journal_destroy() because kjournald never got a chance to read the flag to shut down before the schedule().

Zach pointed out the missing check which led me to hack up this trivial patch. It's been tested many times now and I have yet to reproduce the hang, which was happening very regularly before.

<mild rant>
I'm guessing that we could really use some wait_event() calls with helper functions in, well, most of jbd these days which would make a ton of the wait code there vastly cleaner.
</mild rant>

As for why this doesn't happen in ext3 (or OCFS2 during normal mount/unmount of the local nodes journal), I think it may that the specific timing of events in the ocfs2 recovery thread exposes a race there. Because ocfs2_replay_journal() is only interested in playing back the journal, initialization and shutdown happen very quicky with no other metadata put into that specific journal.

Acked-by: "Stephen C. Tweedie" <>
Signed-off-by: Andrew Morton <>

Signed-off-by: Linus Torvalds <>


Patch from mainstream:

The patch below fixes an ext2/ext3 memory leak: the _fill_super functions allocate percpu data structures but don't free them in _put_super.


Patch from mainstream:

When doing shared mmap writes, the resulting dirty NFS pages may find themselves incapable of being flushed out if I/O is started after the file was released.

Make sure we start I/O on all existing dirty pages in nfs_file_release().


Patch from mainstream:

[PATCH] Fix a race condition in pty.c

There is a race condition int pty.c when pty_close wakes up waiter on its pair device before set TTY_OTHER_CLOSED flag.

It is possible on SMP or preempt kernel, waiter wakes up too early that it will not get TTY_OTHER_CLOSED flag then fall into sleep again - missed wakeup.

hjl reports that this bug will hang some expect scripts on SMP machines.

Signed-off-by: Zou Nan hai <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>


Patch from mainstream:

Fixup of incorrect memset in ia32_rt_sigsuspend().


Patch from mainstream:

When copying data from user-space to kernel-space by __copy_user(), a page_not_present fault sometimes occurs at vmalloced kernel address because of VHPT pre-fetching. Ignore the page_not_present fault in ia64_do_page_fault() before Jumping into exception handlers.


Patch from Kirill:

This patch adds defensive VPS check in proc::may_ptrace_attach(). Suggested by Solar Designer.


Patch from Andrey Mirkin:

This patch fixes vzctl compilation with 2.6 headers on IA64.


Patch from RedHat:

the patch below switches the APIC timer IRQ to the irq-stack, to save ~350 bytes from the 4K process stack - nearly 10% and quite reasonable. I've given it a quick go and it works fine. (Solves bz#151222)


Patch from mainstream:

[PATCH] NFS client O_DIRECT error case fix

The NFS direct-io error return path for request sizes greater than MAX_DIRECTIO_SIZE fails to initialize the returned page struct array pointer to NULL.

Discovered using AKPM's ext3-tools: odwrite -ko 0 16385 foo

Signed-off-by: Bill Rugolsky <>
Signed-off-by: Linus Torvalds <>


Patch from Dmitry:

fixed ability to set conntracks-related params through sysctl interface
Bug 52951.


Patch from mainstream:

The ipt_recent kernel module (ipt_recent.c) in Linux kernel before 2.6.12, when running on 64-bit processors such as AMD64, allows remote attackers to cause a denial of service (kernel panic) via certain attacks such as SSH brute force, which leads to memset calls using a length based on the u_int32_t type, acting on an array of unsigned long elements, a different vulnerability than CVE-2005-2873.

2005/06/15 20:51:14-07:00
[NETFILTER]: ipt_recent: last_pkts is an array of "unsigned long" not "u_int32_t"

This fixes various crashes on 64-bit when using this module. Based upon a patch by Juergen Kreileder <>.

Signed-off-by: David S. Miller <>
ACKed-by: Patrick McHardy <>

GIT: bcfff0b471a60df350338bcd727fc9b8a6aa54b2


Patch from mainstream:

[PATCH] JBD: reduce stack and number of journal descriptors

Dynamically allocate the holding array for kjournald write patching rather than allocating it on the stack.

Signed-off-by: Alex Tomas <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>


Patch from mainstream, backported by Alexey:

get_user_pages() oopses on vsyscall pages. Mainstream has at least two critical patches:

Author: ak <>
Date:   Mon Nov 15 19:53:40 2004 -0800

  [PATCH] x86-64: Fix get_user_pages access to vsyscall page

  The current kernel oopses on x86-64 when gdb steps into the vsyscall page.
  This patch fixes it.

  I also removed the bogus NULL checks of _offset and replaced them with
  proper _none checks.  I made them BUGs because vsyscall pages should be
  always mapped.

  Signed-off-by: Andi Kleen <>
  Signed-off-by: Andrew Morton <>
  Signed-off-by: Linus Torvalds <>


diff-tree 690dbe1ced143876d8fa56b72310738dbe079d0a (from 74f9c9c258249fba3e2e78f)

Author: Hugh Dickins <>
Date:   Mon Aug 1 21:11:42 2005 -0700

  [PATCH] x86_64: access of some bad address

  x86_64 has a large sparse gate area between VSYSCALL_START and
  VSYSCALL_END, not all of it presently backed by pmds.  Alexander Nyberg has
  found that in some circumstances gdb may try to ptrace here, and hit
  get_user_pages BUG_ON.  It seems odd that gdb should be accessing here, but
  it certainly shouldn't crash in this way: relax BUG_ON to -EFAULT.
  Fixes kernel bugzilla #4801.

  Signed-off-by: Hugh Dickins <>
  Cc: Andi Kleen <>
  Signed-off-by: Andrew Morton <>
  Signed-off-by: Linus Torvalds <>


Patch from mainstream:

[PATCH] x86_64: fix vsyscalls
Author: ak <>
Date: Fri Nov 19 15:20:37 2004 -0800

Fix incorrect alignment in the vsyscall variables that caused vsyscalls to be completely broken.

This change should decrease system time during TPC-* tests considerably.

Clean up the to make it easier readable

Do some cleanups in the vsyscall code.

Align cacheline_aligned correctly on 128 byte cacheline systems.

Signed-off-by: Andi Kleen <>
Signed-off-by: Linus Torvalds <>



Patch from mainstream:

[PKT_SCHED]: CBQ; Destroy filters before destroying classes. CBQ destroys its classes by traversing the hashtable and thus classes are not destroyed from root to leafs which means that class Y being a subclass of class X may be destroyed before X. This is a problem if a filter is attached to class X (parent) classifying into class Y (result). In case Y gets deleted before X the filter references an already deleted class while trying to unbind (cbq_unbind_filter). Therefore all filters must be destroyed before destroying classes. An additional BUG_TRAP has been added to document this not so obvious case.

Bug 52585.


Patch from mainstream, prepared by Pavel:

[PATCH] ia64 ptrace + sigrestore_context (CAN-2005-1761)

This patch fixes handling of accesses to ar.rsc via ptrace & restore_sigcontext


Patch from Pavel:

IA64 uses register backing store area for tasks, and it grows like stack does, so we must charge it the same way. Possible fix of one leaked privvmpage after each VE stop.


Patch from Alexander:

This patch fixes kernel compilation when CONFIG_VZQUOTA=n

OpenVZ Bug #52.


Patch from Alexander:

This patch fixes compilation of VZFS when CONFIG_VZQUOTA=n

OpenVZ Bug #52.


Patch from Alexander:

fixes broken compilation when CONFIG_VE_NETDEV=n

OpenVZ Bug #52.


Patch from Pavel:

expand_stack() has two incarnations - for STACK_GROW_UP and STACK_GROWS_DOWN. One of them uses UB_LOW constant, which is absent. Fixed.


Patch from Dmitry:

added vpid field to /proc/*/status and /proc/*/stat

Bug 52680.


Patch from Dmitry:

fix of df output in case of quota limit exceedance

OpenVZ Bug #59.


Patch from Kirill:

This patch prohibits processes entered to VPS to be ptraceable from the VPS. This doesn't fix any security issue by itself, since vzctl enter don't leak any sensitive information. But this makes isolation more logically correct and can prevent possible security issues in future.