Download/kernel/2.6.8/022stab045.1/changes

From OpenVZ Virtuozzo Containers Wiki
Jump to: navigation, search

Contents

Changes

  • Driver updates from RHEL4u2/official sites to make OpenVZ conform to official HCL
  • Mainstream security fixes
  • Fixes for EMT64/ia64 compilation
  • Small VPS/UBC fixes

Configs

The same as 022stab044.1 plus:

  • +CONFIG_ATA_OVER_ETH=y
  • +CONFIG_SCSI_LPFC=y
  • +CONFIG_SCSI_ISCSI_SFNET=y
  • +CONFIG_SCSI_QLA4XXX=y
  • +CONFIG_SCSI_QLA4XXX_FAILOVER=n
  • +CONFIG_SCSI_FC_ATTRS=y
  • +CONFIG_SCSI_ISCSI_ATTRS=y

Driver updates

  • aacraid v1.1.5 (site)
  • aoe v14 (site)
  • e1000 v6.0.54 (site)
  • e100 v3.4.8 (site)
  • emulex v8.0.16.17 (site)
  • iscsi-sfnet v4.0.1.11.1 (rhel4u2)
  • megaraid v2.20.x (site)
  • qla4xx v5.00.02 (site)
  • r8169 v2.2 (site)
  • sk98lin v8.24.1.3 (site)
  • snapapi v0.6.7 (site)
  • tg3 v3.27 (rhel4u)

Other updates

  • scsi midlayer (rhel4u2)
  • ide csb6-raid support (rhel4u2)
  • intel ich7 and esb2 support (rhel4u2)
  • libata v1.11 (rhel4u2)

Patches

diff-ve-emt64-apicirq-execenv-20051028

Patch from Pavel:

Add set_exec_env(get_ve0()) and back in emt64's smp_apic_timer_interrupt() call.

diff-ve-ia64-irq-execenv-20051028

Patch from Pavel:

Added set_exec_env(get_ve0()) and back in handling of irq in ia64.

diff-ve-emt64-irq-execenv-20051028

Patch from Pavel:

Added set_exec_env(get_ve0()) and back in do_IRQ for x86_64 arch.

diff-ubc-ia64-irq-execub-20051028

Patch from Pavel:

Added ub0 execub context in ia64 irq handling.

diff-ubc-emt64-irq-execub-20051028

Patch from Pavel:

Added ub0 execub context in irq handling on x86_64.

diff-security-x86-sysexit-20041212

Patch from mainstream:

x86 sysenter: clear %ebp on exit.

It contains the thread info pointer. That's not something that user mode can really use for anything interesting, but it's also not something that user mode should ever really see.

Pointed out by Brad Spender as being in PaX.

diff-ms-emt64-tssldt-lim-20051027

Patch from mainstream:

[PATCH] Fix LDT/TSS limit on x86-64

Paul Menage pointed out that the previous change for the LDT/TSS limit on x86-64 was incorrect. This could cause the user to corrupt memory beyond the LDT. This patch implements the fix suggested by Paul.

http://linux.bkbits.net:8080/linux-2.6/cset@1.1938.63.107
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=146244

diff-ve-vzwdog-pginfo-20051027

Patch from Andrey Mirkin, modified by Kirill:

This patch adds print of pgdat info in vzwdog.

diff-CAN-2005-0135-ia64-unwind

Patch from mainstream:

[IA64] Sanity check unw_unwind_to_user

The unw_unwind_to_user function in unwind.c on Itanium (ia64) architectures in Linux kernel 2.6 allows local users to cause a denial of service (system crash).

Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

http://linux.bkbits.net:8080/linux-2.6/cset@1.1966.2.27

diff-CAN-2005-0136-ia64

Patch from mainstream, ported by Pavel:

A flaw affecting the auditing code was discovered. On Itanium architectures a local user could use this flaw to cause a denial of service (crash). This issue is rated as having important security impact (CAN-2005-0136).

http://linux.bkbits.net:8080/linux-2.6/gnupatch@41f2d1eePludGYyb1yOmGaW6Iois8Q

diff-ms-ia64-ptrace-spd-20051025

Patch from mainstream:

[IA64] speedup ptrace by avoiding kernel-stack walk

This patch changes the syscall entry path to store the current-frame-mask (CFM) in pt_regs->cr_ifs. This just takes one extra instruction (a "dep" to clear the bits other than 0-37) and is free in terms of cycles.

The advantage of doing this is that it lets ptrace() avoid having to walk the stack to determine the end of the user-level backing-store of a process which is in the middle of a system-call. Since this is what strace does all the time, this speeds up strace quite a bit (by ~50%). More importantly, it makes the syscall vs. non-syscall case much more symmetric, which is always something I wanted.

Note that the change to ivt.S looks big but this is just a rippling effect of instruction-scheduling to keep syscall latency the same. All that's really going on there is that instead of storing 0 into cr_ifs member we store the low 38 bits of ar.pfs.

Signed-off-by: David Mosberger <davidm@hpl.hp.com>

Signed-off-by: Tony Luck <tony.luck@intel.com>

diff-ms-ia64-cpurelax-20051026

Patch from mainstream:

[IA64] add cpu_relax() in the body of spin loops

This patch adds cpu_relax() in the body of spin loops in smp_call_function(), smp_call_function_single(), and ia64_mca_wakeup_ipi_wait().

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>

Signed-off-by: Tony Luck <tony.luck@intel.com>

http://linux.bkbits.net:8080/linux-2.6/cset@1.1938.335.15

diff-video-vga80x25-20051027

Patch from Kostja:

Hack to force 80x25 video mode on boot, when framebuffer is not configured.

diff-ms-e7xx-irqaffinity-disable

Patch from mainsream/RHEL4u2:

Add pci quircks for intel E7320_MCH and E7525_MCH to disable irq balancing.

As part of the workaround for the "Interrupt message re-ordering across hub interface" errata (page #16 in
http://developer.intel.com/design/chipsets/specupdt/30288402.pdf), BIOS may enable hardware IRQ balancing for E7520/E7320/E7525(revision ID 0x9 and below) based platforms.

Add pci quirks to disable SW irqbalance/affinity on those platforms. Move balanced_irq_init() to late_initcall so that kirqd will be started after pci quirks.

diff-ms-invalidate-page-race-fix

Patch from mainstream:

invalidate_inode_pages() and invalidate_inode_pages2() can mark pages not uptodate while read() is trying to read from them. This is interpreted as an I/O error.

Fix that by teaching the invalidate code to leave the page alone if someone else has a ref on it.

http://linux.bkbits.net:8080/linux-2.6/gnupatch@4174aca2ocZwQ_22QLBHXsj0hDWUWw

diff-ubc-pbc-racefix-20051027

Patch from Pavel, modified by Kirill:

Fixup of race between page_beancounting removing and checking page for having valid pbc.
Bug 52609.

diff-ve-ia64-taskvisibility-20051027

Patch from Andrey:

This patch fixes VE tasks find/travers in ia64 perfomance monitor

diff-ms-jbd-umount-race

Patch from mainstream:

[PATCH] kjournald: missing JFS_UNMOUNT check

It seems that kjournald() may be missing a check of the JFS_UNMOUNT flag before calling schedule(). This showed up in testing of OCFS2 recovery where our recovery thread would hang in journal_kill_thread() called from journal_destroy() because kjournald never got a chance to read the flag to shut down before the schedule().

Zach pointed out the missing check which led me to hack up this trivial patch. It's been tested many times now and I have yet to reproduce the hang, which was happening very regularly before.

<mild rant>
I'm guessing that we could really use some wait_event() calls with helper functions in, well, most of jbd these days which would make a ton of the wait code there vastly cleaner.
</mild rant>

As for why this doesn't happen in ext3 (or OCFS2 during normal mount/unmount of the local nodes journal), I think it may that the specific timing of events in the ocfs2 recovery thread exposes a race there. Because ocfs2_replay_journal() is only interested in playing back the journal, initialization and shutdown happen very quicky with no other metadata put into that specific journal.

Acked-by: "Stephen C. Tweedie" <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>

Signed-off-by: Linus Torvalds <torvalds@osdl.org>

http://linux.bkbits.net:8080/linux-2.6/gnupatch@431f7f05jxd-iagNaeYGxq4IVmcwYg

diff-ms-ext2-umount-leak

Patch from mainstream:

The patch below fixes an ext2/ext3 memory leak: the _fill_super functions allocate percpu data structures but don't free them in _put_super.

http://linux.bkbits.net:8080/linux-2.6/gnupatch@41bdc37fLNoIB6Kx0Q-o47geCYYAYg

diff-ms-nfs-mmap-corruption

Patch from mainstream:

When doing shared mmap writes, the resulting dirty NFS pages may find themselves incapable of being flushed out if I/O is started after the file was released.

Make sure we start I/O on all existing dirty pages in nfs_file_release().

http://linux.bkbits.net:8080/linux-2.6/gnupatch@4237ab9clq5WkE9BXlZbzpb6sb0_7Q

diff-ms-pty-close-race-20041218

Patch from mainstream:

[PATCH] Fix a race condition in pty.c

There is a race condition int pty.c when pty_close wakes up waiter on its pair device before set TTY_OTHER_CLOSED flag.

It is possible on SMP or preempt kernel, waiter wakes up too early that it will not get TTY_OTHER_CLOSED flag then fall into sleep again - missed wakeup.

hjl reports that this bug will hang some expect scripts on SMP machines.

Signed-off-by: Zou Nan hai <Nanhai.zou@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

diff-ms-ia64-ia32-sigsusp-20051026

Patch from mainstream:

Fixup of incorrect memset in ia32_rt_sigsuspend().

diff-ms-ia64-vmallocfaults

Patch from mainstream:

When copying data from user-space to kernel-space by __copy_user(), a page_not_present fault sometimes occurs at vmalloced kernel address because of VHPT pre-fetching. Ignore the page_not_present fault in ia64_do_page_fault() before Jumping into exception handlers.

http://linux.bkbits.net:8080/linux-2.6/gnupatch@431e211200BFHGYtKlZEEKV7PWQ1SA

diff-ve-procptrace-20051027

Patch from Kirill:

This patch adds defensive VPS check in proc::may_ptrace_attach(). Suggested by Solar Designer.

diff-ia64-headers-20051025

Patch from Andrey Mirkin:

This patch fixes vzctl compilation with 2.6 headers on IA64.

diff-rh-irq-stack-apic-context

Patch from RedHat:

the patch below switches the APIC timer IRQ to the irq-stack, to save ~350 bytes from the 4K process stack - nearly 10% and quite reasonable. I've given it a quick go and it works fine. (Solves bz#151222)

diff-CAN-2005-0207-nfsd

Patch from mainstream:

[PATCH] NFS client O_DIRECT error case fix

The NFS direct-io error return path for request sizes greater than MAX_DIRECTIO_SIZE fails to initialize the returned page struct array pointer to NULL.

Discovered using AKPM's ext3-tools: odwrite -ko 0 16385 foo

Signed-off-by: Bill Rugolsky <brugolsky@telemetry-investments.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

diff-ve-ip-conntrack-sysctls-20051026

Patch from Dmitry:

fixed ability to set conntracks-related params through sysctl interface
Bug 52951.

diff-CAN-2005-2872-ipt-recent

Patch from mainstream:

The ipt_recent kernel module (ipt_recent.c) in Linux kernel before 2.6.12, when running on 64-bit processors such as AMD64, allows remote attackers to cause a denial of service (kernel panic) via certain attacks such as SSH brute force, which leads to memset calls using a length based on the u_int32_t type, acting on an array of unsigned long elements, a different vulnerability than CVE-2005-2873.

2005/06/15 20:51:14-07:00 davem@davemloft.net
[NETFILTER]: ipt_recent: last_pkts is an array of "unsigned long" not "u_int32_t"

This fixes various crashes on 64-bit when using this module. Based upon a patch by Juergen Kreileder <jk@blackdown.de>.

Signed-off-by: David S. Miller <davem@davemloft.net>
ACKed-by: Patrick McHardy <kaber@trash.net>

GIT: bcfff0b471a60df350338bcd727fc9b8a6aa54b2

diff-ms-jbdstack-20051025

Patch from mainstream:

[PATCH] JBD: reduce stack and number of journal descriptors

Dynamically allocate the holding array for kjournald write patching rather than allocating it on the stack.

Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

diff-emt64-gatevma-20051017

Patch from mainstream, backported by Alexey:

get_user_pages() oopses on vsyscall pages. Mainstream has at least two critical patches:

Author: ak <ak@suse.de>
Date:   Mon Nov 15 19:53:40 2004 -0800

  [PATCH] x86-64: Fix get_user_pages access to vsyscall page

  The current kernel oopses on x86-64 when gdb steps into the vsyscall page.
  This patch fixes it.

  I also removed the bogus NULL checks of _offset and replaced them with
  proper _none checks.  I made them BUGs because vsyscall pages should be
  always mapped.

  Signed-off-by: Andi Kleen <ak@suse.de>
  Signed-off-by: Andrew Morton <akpm@osdl.org>
  Signed-off-by: Linus Torvalds <torvalds@osdl.org>

  ChangeSet@1.1938.364.10

diff-tree 690dbe1ced143876d8fa56b72310738dbe079d0a (from 74f9c9c258249fba3e2e78f)

Author: Hugh Dickins <hugh@veritas.com>
Date:   Mon Aug 1 21:11:42 2005 -0700

  [PATCH] x86_64: access of some bad address

  x86_64 has a large sparse gate area between VSYSCALL_START and
  VSYSCALL_END, not all of it presently backed by pmds.  Alexander Nyberg has
  found that in some circumstances gdb may try to ptrace here, and hit
  get_user_pages BUG_ON.  It seems odd that gdb should be accessing here, but
  it certainly shouldn't crash in this way: relax BUG_ON to -EFAULT.
  
  Fixes kernel bugzilla #4801.

  Signed-off-by: Hugh Dickins <hugh@veritas.com>
  Cc: Andi Kleen <ak@suse.de>
  Signed-off-by: Andrew Morton <akpm@osdl.org>
  Signed-off-by: Linus Torvalds <torvalds@osdl.org>

diff-emt64-vsyscall-20041119

Patch from mainstream:

[PATCH] x86_64: fix vsyscalls
Author: ak <ak@suse.de>
Date: Fri Nov 19 15:20:37 2004 -0800

Fix incorrect alignment in the vsyscall variables that caused vsyscalls to be completely broken.

This change should decrease system time during TPC-* tests considerably.

Clean up the vmlinux.lds to make it easier readable

Do some cleanups in the vsyscall code.

Align cacheline_aligned correctly on 128 byte cacheline systems.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

ChangeSet@1.1938.386.62

diff-ms-cbq-destroy-20051025

Patch from mainstream:

[PKT_SCHED]: CBQ; Destroy filters before destroying classes. CBQ destroys its classes by traversing the hashtable and thus classes are not destroyed from root to leafs which means that class Y being a subclass of class X may be destroyed before X. This is a problem if a filter is attached to class X (parent) classifying into class Y (result). In case Y gets deleted before X the filter references an already deleted class while trying to unbind (cbq_unbind_filter). Therefore all filters must be destroyed before destroying classes. An additional BUG_TRAP has been added to document this not so obvious case.

http://linux.bkbits.net:8080/linux-2.6/gnupatch@4175e7a1Be1t1bq0UgwJIOmb2Jjo_Q

Bug 52585.

diff-security-ia64-pl3-20051018

Patch from mainstream, prepared by Pavel:

[PATCH] ia64 ptrace + sigrestore_context (CAN-2005-1761)

This patch fixes handling of accesses to ar.rsc via ptrace & restore_sigcontext

diff-ubc-ia64-charge-20051020

Patch from Pavel:

IA64 uses register backing store area for tasks, and it grows like stack does, so we must charge it the same way. Possible fix of one leaked privvmpage after each VE stop.

diff-vzdq-comp-quotaoff-20051020

Patch from Alexander:

This patch fixes kernel compilation when CONFIG_VZQUOTA=n

OpenVZ Bug #52.

diff-vefs-comp-quotaoff-20051020

Patch from Alexander:

This patch fixes compilation of VZFS when CONFIG_VZQUOTA=n

OpenVZ Bug #52.

diff-ve-venet-comp-20051021

Patch from Alexander:

fixes broken compilation when CONFIG_VE_NETDEV=n

OpenVZ Bug #52.

diff-ubc-expandstack-fix-20051020

Patch from Pavel:

expand_stack() has two incarnations - for STACK_GROW_UP and STACK_GROWS_DOWN. One of them uses UB_LOW constant, which is absent. Fixed.

diff-ve-proc-vpid-20051024

Patch from Dmitry:

added vpid field to /proc/*/status and /proc/*/stat

Bug 52680.

diff-simfs-statfs-fix-20051024

Patch from Dmitry:

fix of df output in case of quota limit exceedance

OpenVZ Bug #59.

diff-ve-vpsdumpable-20051024

Patch from Kirill:

This patch prohibits processes entered to VPS to be ptraceable from the VPS. This doesn't fix any security issue by itself, since vzctl enter don't leak any sensitive information. But this makes isolation more logically correct and can prevent possible security issues in future.