From OpenVZ Virtuozzo Containers Wiki
Jump to: navigation, search



  • Added multiple quota points inside VPS support
  • Added /proc/vz/devperms with VPS device permisions
  • Mainstream fixes (proc serialization, emt64)
  • Finished merging x86-64 patches
  • Removed fairscheduler debug/started its optimization
  • Fairscheduler CPU limit and AMD lockup fixes
  • Multiple drivers updates



Same as 022stab038.1 plus:

  • +CONFIG_GART_IOMMU=y (in x86-64)
  • +CONFIG_SCSI_MEGARAID is removed



Patch from Andrey:

This patch reimplements /proc/vz/vzaquota - a directory containing entries for each vzquota-enabled superblock with aquota.user and files.

The goal is to support standard quota tools and allow VPSs to have multiple quota partitions.

Entries in /proc/vz/vzaquota are device numbers of the superblocks (a single 32-bit hexadecimal value as returned by sys_stat64, not a major-minor pair).

No VE start/stop hooks are used in this implementation. Compilation with unusual config options was fixed where I noticed.


Patch from Pavel:

Serializes access to proc tree with rwsem.


Patch from Denis:

This patch fixes incorrect error path in proc_get_inode(), when module can't be get due to it's being unloaded. When try_module_get() fails, this function puts de(!) and still returns inode with non-getted de.


Patch from Alexey:

Fix NULL dereference in vma_merge.

It is funny how gcc compiled it. gcc figured out that the pointer can be NULL sometimes and compiled a separate(!) block for this case, which was optimized to understand that the pointer is NULL.


Patch from mainstream, noted by Alexey (alexey):

[PATCH] x86-64: avoid deadlock in page fault handler

Avoid deadlock when kernel fault happens inside mmap sem.


Patch from Alexey:

EMT64: add missing () around arguments of pte_index macro


Patch from Pavel:

This patch fixes UBC accounting on x86-64 to ia32 emulation when setup of arg pages is performed. Previous patch was broken.


Patch from Kirill:

This patch is from fairsched performance improving series:

  • it removes vsched->lock merging it with fairsched_lock. This greatly reduces number of locks on hot schedule path
  • this also prepares code for balancing activation
  • remove wrong BUG_ON in vcpu_put. on schedule_vcpu() restart VCPU can be already get...
  • show_vsched() requires oops_in_progress when debug patches are dropped


Patch from Pavel:

Type "int" can not be just casted (by gcc) to type "void *". Need to recast it via "unsigned long".


Patch from Pavel:

This adds missed include <linux/ve_owner.h> to ipc to make it compile on x86-64.


Patch from Pavel:

Just add #include <linux/namespace.h> to kernel/compat.c to make it compile.


Patch from Pavel:

Replace include <ub/ub_task.h> from asm/thread_info.h into linux/sched.h - where it is really needed. This patch makes kernel to be compilable on non-i386 arches.


Patch from mainstream:

[PATCH] x86_64: Add 32bit quota support

[untested, but other 64bit ports seem to get away with it]

sys_quotactl seems to be 32/64bit clean, enable it for 32bit.

Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>


Patch from Andrey:

This patch exports get_device_perm_ve to be used in vzdq_file (check of allowed devices for /proc/vz/vzaquota).


Patch from Pavel:

Ported part of patch from mainstream that initializes apic nmi watchdog for P4 CPU. This fixes strange oopses when NMI is ON on EMT64.

Bug 51143.
Bug 51206.


Patch from Andrey:

This patch adds fairsched syscalls on ia64.


Patch from Andrey Mirkin:

This patch adds UBC syscalls on ia64.


Patch from Dmitry:

added permissions on /dev/full to default VPS set.

Bug 51512.


Patch from mainstream:

[PATCH] Buffer overrun in arch/x86_64/sys_ia32.c:sys32_ni_syscall()

struct task_struct.comm is defined to be 16 chars, but arch/x86_64/sys_ia32.c:sys32_ni_syscall() and sys32_vm86_warning() copy it into a static 8 byte buffer, which will surely cause problems. This patch makes lastcomm[] the right size, and makes sure it can't be overrun. Since the code also goes to the effort of getting a local copy of current in "me", we may as well use it for printing the message.

Signed-off-by: Chris Wright <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>


Patch from mainstream:

[PATCH] x86[64]: display phys_proc_id only when it is initialized

phys_proc_id gets initialized only when (smp_num_siblings > 1). But gets printed even when (smp_num_siblings == 1). As a result we print incorrect physical processor id in /proc/cpuinfo, when HT is disabled.

Signed-off-by:: "Venkatesh Pallipadi" <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>


Patch from mainstream:

[PATCH] x86_64: Fix lost edge triggered irqs on UP kernel

There are problems with IDE disks while running UP kernel on x86-64 - it complained a lot about lost irq from hda/hdc. At enable_irq() code calls hw_resend_irq(), but on x86-64 hw_resend_irq() does something useful only when CONFIG_SMP is defined, on UP systems it does nothing. Due to this IRQ is lost - and when IDE retries command, it can again happen that IRQ is delivered before IDE code does enable_irq(), and again and again, unless due to drive being lazy finally once kernel does enable_irq() before drive prepares its answer, and things move forward ... to next lost IRQ.

Signed-off-by: Andi Kleen <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>


Patch from mainstream:

[PATCH] x86_64: no TIOCSBRK/TIOCCBRK in ia32 emulation

In ia32 emulation, the amd64 kernel refuses the ioctls TIOCSBRK and TIOCCBRK with EINVAL. I've attached a patch that adds them to the compatibility list.

Since all architectures have these ioctls ("m68knommu" inherits them from "m68k", "um" from its host) and use the same code, I think adding them to compat_ioctl.h is the correct choice (as opposed to adding them to arch/x86_64/ia32/ia32_ioctl.c).

Signed-off-by: Werner Almesberger <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>


Patch from Kir:

Due to missing #ifdef kernel won't compile if CONFIG_QUOTA_COMPAT is not set.


Patch from Dmitry:

fixed built-in compilation of netfilters

OpenVZ Bug #40.


Patch from Andrey Mirkin:

This patch removes obsolete macros from ipc/shm.c. It is necessary to remove these macros because on emt64 arch there are no free space after shmid_kernel struct for additional pointer.


Patch from Andrey Mirkin, modified by Kirill:

This patch fixes ia64 tasks accessing code: do_each_thread/for_each_process/find_task_by_pid


Patch from Andrey:

This patch adds UBC EXECPRIO flag to ia64 arch.


Patch from Andrey Mirkin:

This patch fixes compilation of warn_bad_zap when UBC=n.


Patch from Dmitry:

fixed compilation with CONFIG_NETFILTER_DEBUG enabled


Patch from Pavel:

ub_dentry_charge() should drop dentry.d_lock and rcu_read_lock. Without UBCs compiled kernel gets stuck on the first lookup.


Patch from Pavel:

When CONFIG_DISCONTIGMEM is ON mem_map symbol is not present, so pb_hash function refused to compile. It's ok to use page_to_pfn() macro in pb_hash() to calculate hash in both cases with and without DISCONTIGMEM.


Patch from Pavel:

This patch adds /proc/vz/devperms proc file to vzmon module. It shows device permissions per VPS. File line format is <veid> [bc] <perm> <maj>:(<min>|*)


Patch from Pavel:

Fixup of /proc/vz/devperms output


Patch from Alexey:

Virtualize utsname on EMT64, port from i386


Patch from Kirill:

This patch replaces temporary diff-fairsched-amd-20051010, which fixed a problem with AMD processors described below.

Correct solution is to keep both fairscheduler and vsched in sync, i.e. having node->pcpus corresponding to the number of running VCPUs. So fairsched will select node for scheduling _only_ if it have an active selectable VCPU. The whole restart path in one place has gone.


Patch from Pavel:
  • Remove kernel specific structure from userspace view;
  • Add missed struct (ubstatfull_t). It was missed because it was not used in kernel code at all.

Bug 52195.


Patch from Pavel:

Small optimization for per-cpu scheduling latency accounting:

  • Move lock in kstat_lat_pcpu_struct into structure with statistic fields to make them fit one cacheline;
  • Make the structure cacheline aligned.


Patch from Pavel:

This patch fixes per cpu sched latency accounting: seq_counts were not protected for writers. This caused readers to hang in reading loop for ever.


Patch from Kirill:

small cleanups in sched.c


Patch from mainstream:

This patch fixes a RLIMIT_MEMLOCK issue, which is not a security actually in VZ due to UBC.

Bug 42254.




Patch from Andrey Mirkin:

Set of patches to add vz options in arch/ia64/Kconfig


Patch from mainstream:

[PATCH] compat: sigtimedwait

  • Merge sys32_rt_sigtimedwait function in X86_64, IA64, PPC64, MIPS, SPARC64, S390 32 bit layer into 1 compat_rt_sigtimedwait function. It will also fix a bug of copy wrong information to 32 bit userspace siginfo structure on X86_64, IA64 and SPARC64 when calling sigtimedwait on 32 bit layer.
  • Change all name the of siginfo_t32 structure in X86_64, IA64, MIPS, SPARC64 and S390 to the name compat_siginfo_t as used in PPC64.
  • Patch introduced a macro __COMPAT_ENDIAN_SWAP__ in include/asm-mips/compat.h when MIPS kernel is compiled in little-endian mode. This macro is used to do byte swapping in function sigset_from_compat.
  • This patch is only tested on X86_64 and IA_64.

Signed-off-by: Zou Nan hai <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>


Patch from mainstream:

[PATCH] x86_64: TASK_SIZE fixes for compatibility mode processes

Appended patch will setup compatibility mode TASK_SIZE properly. This will fix atleast three known bugs that can be encountered while running compatibility mode apps.

  • A malicious 32bit app can have an elf section at 0xffffe000. During exec of this app, we will have a memory leak as insert_vm_struct() is not checking for return value in syscall32_setup_pages() and thus not freeing the vma allocated for the vsyscall page. And instead of exec failing (as it has addresses > TASK_SIZE), we were allowing it to succeed previously.
  • With a 32bit app, hugetlb_get_unmapped_area/arch_get_unmapped_area may return addresses beyond 32bits, ultimately causing corruption because of wrap-around and resulting in SEGFAULT, instead of returning ENOMEM.
    • 32bit app doing this below mmap will now fail.
    mmap((void *)(0xFFFFE000UL), 0x10000UL, PROT_READ|PROT_WRITE,
            MAP_FIXED|MAP_PRIVATE|MAP_ANON, 0, 0);</code>

    Signed-off-by: Zou Nan hai <>
    Signed-off-by: Suresh Siddha <>
    Cc: Andi Kleen <>
    Signed-off-by: Andrew Morton <>
    Signed-off-by: Linus Torvalds <>

    GIT: 84929801e14d968caeb84795bfbb88f04283fbd9;a=commitdiff;h=84929801e14d968caeb84795bfbb88f04283fbd9

  • diff-ms-sendfile-20051007

    Patch from mainstream, ported by Pavel:

    If we use 64bit kernel on ia64/x86_64/s390 architecture, and we run 32bit binary on 32bit compatibility mode, sendfile system call seems not set offset argument.

    This is because sendfile's return value is not zero but the code regards the result by return value is zero or not.

    This problem will affect ia64/x86_64/s390 and not affect other architectures (mips/parisc/ppc64/sparc64).


    Patch from Pavel:

    Some info-printk can be triggered by userspace process. No need to spoil main logbuf.


    Patch from mainstream:

    Disable interrupts during SMP bogomips checking. This happend on our machines: when bogomips were counted IRQ hapanned, ran timers and oopsed.

    Bug 51987.


    Patch from Kirill:

    Small cleanup of VZDQ after recent changes by Andrey


    Patch from mainstream:

    [PATCH] forcedeth: Initialize link settings in every nv_open()

    Rdiger found a bug in nv_open that explains some of the reports with duplex mismatches:
    nv_open calls nv_update_link_speed for initializing the hardware link speed registers. If current link setting matches the values in np->linkspeed and np->duplex, then the function does nothing.

    Usually, doing nothing is the right thing, but not in nv_open: During nv_open, the registers must be initialized because the nic was reset.

    The attached patch fixes that by setting np->linkspeed to an invalid value before calling nv_update_link_speed from nv_open.

    Signed-Off-By: Manfred Spraul <>
    Signed-off-by: Jeff Garzik <>
    Signed-off-by: Chris Wright <>;a=commitdiff;h=2498037d5a6668b733acc712a3106ffd4e1ef735


    Patch from Denis:

    This patch fixes skb->truesize assignment, synchronizing it with mainstream. The problem was observed by Alexey and concerns TCP window size, which was improperly get as 48k instead of 64k by default. UBC accouting is unchanged.


    Patch from Denis:

    This patch fixes UBC accounting in tun.c in accordance with diff-ubc-flowcontrol-20051005


    Patch from Alexander:

    This patch adds sysctl to enable/disable pid virtualization on VPS start.


    Patch from mainstream:

    [PATCH] Fix fs/exec.c:788 (de_thread()) BUG_ON

    It turns out that the BUG_ON() in fs/exec.c: de_thread() is unreliable and can trigger due to the test itself being racy. And actually there is no need for all threads to have exited at this point, so we simply kill the BUG_ON.

    Signed-off-by: Alexander Nyberg <>
    Cc: Roland McGrath <>
    Cc: Andrew Morton <>
    Cc: Ingo Molnar <>
    Acked-by: Andi Kleen <>
    Signed-off-by: Linus Torvalds <>
    Signed-off-by: Chris Wright <>


    Patch from Pavel:

    Adds necessary corrections into functions that show scheduling statistics to work with new per-cpu stats.


    Patch from Pavel:

    This patch makes schedule() use kstat_lat_pcpu_struct to store info about scheduling latencies. This makes possible to avoid taking kstat_glob_lock in schedule().


    Patch from Pavel:

    Adds kstat_lat_pcpu_struct to account latencies per-cpu. It will be used in schedule() to avoid using kstat_glob_lock.


    Patch from Dmitry:

    This patch adds cycles CMP macros. Just small cleanup and avoids possible cycles wrap around (though unlikely to happen ever).


    Patch from Pavel:

    Some calls of printk() can be triggered by userspace process. No need to spoil main logbuf.


    Patch from Andrey:

    This is part of diff-ve-devnum-20051005 changes.


    Patch from Andrey:

    This patch transfers bits 8..11 of unnamed device minor into major, using additional major numbers, which is currently enough for 1000 VEs. It is needed as a bandaid for coreutils (e.g., mknod) that still cannot use minor or major numbers >= 256; mknod on unnamed devices is used for support of second level quota inside VE.


    Patch from Pavel:

    When CONFIG_FAIRSCHED is not set fairsched_lock is not present in kernel, but vcpu scheduler uses it to synchronize it's own stuff. Added spinlock with the same name nuder appropriate #ifdef having nothing better in mind.


    Patch from Pavel, modified by Kirill:

    This patch fixes show_vsched() to be compilable w/o fairsched support.


    Patch from Pavel:

    On IA64 reading ubc pointer from slab sometimes causes "unaligned access" exception.

    UBC-in-slab pointers must be sizeof(void *)-aligned.


    Patch from Kirill:

    This patch replaces std migrate_all_tasks() with own version for VCPU scheduler. It doesn't migrate any tasks now, just do sanity checks. It fixes compilation of IA64 kernel, since it used cpu_to_node() macro before...


    Patch from Dmitry:

    This patch fixes CPU limiting issues in fairsched due to:

    • misprint in fairsched_delayed_insert()
    • TSC deviation on different CPUs on test machines

    Bug 51563.
    Bug 50457.


    Patch from Pavel:

    Same as in x86_64: need to charge arg pages set up for ia32 elf binary.


    Patch from Pavel:

    This patch adds necessary permissions to default devperms for VE0 to make std quota tools work inside VE0.


    Patch from Andrey:

    This patch fixes kernel device representation (decoded device) passed to get_device_perms_ve() in sys_ustat().


    Patch from Pavel:

    When CONFIG_FAIRSCHED is off vz_scale_khz is unresolved. Fixed.


    Patch from Pavel:

    When CONFIG_FAIRSCHED is not set syscall sys_fairsched_rate() is not found.


    Patch from Andrey Mirkin and Vasily:

    This patch fixes mpt fusion scsi driver stalling while booting. This patch should be applied in RPMs.


    Patch from Konstantin:

    Patch solves following problems:

    • Forgotten counter incrementation in sis900_rx() in case it doesn't get memory for skb, that leads to whole interface failure. Problem is accompanied with messages:
       eth0: Memory squeeze,deferring packet.
       eth0: NULL pointer encountered in Rx ring, skipping
    • If counter cur_rx overflows and there'll be temporary memory problems buffer can't be recreated later, when memory IS avaliable.
    • Limit the work in handler to prevent the endless packets processing if new packets are generated faster then handled.

    In -mm tree: sis900-come-alive-after-temporary-memory-shortage.patch

    Signed-off-by: Konstantin Khorenko <>
    Signed-off-by: Vasily Averin <>
    Signed-off-by: Daniele Venzano <>
    Cc: Jeff Garzik <>
    Signed-off-by: Andrew Morton <>


    Patch from mainstream, ported by Pavel:

    Use a real VMA to map the 32bit vsyscall page.

    This fixes leaking of syscall32 page table entries.

    This is a merge of two patches:


    Patch from Pavel:

    Since ub_pages_charged and ub_vmalloc_charged are per-cpu they can sometimes be negative. According type (long instead of unsigned int) is needed and appropriate struts in ubd_show() (print this info into proc file) function.


    Patch from mainstream, ported by Denis:

    [TCP]: Fix excessive stack usage resulting in OOPS with 4KSTACKS.

    Various routines were putting a full struct tcp_sock on the local stack. What they really wanted was a subset of this information when doing TCP options processing when we only have a mini-socket (for example in SYN-RECVD and TIME_WAIT states).

    Therefore pull out the needed information into a sub-struct and use that in the TCP options processing routines.

    Signed-off-by: Arnaldo Carvalho de Melo <>
    Signed-off-by: David S. Miller <>


    Patch from Dmitry, bug found by Benedikt Boehm:

    added neccessary #ifdef for compilation with disabled CONFIG_LEGACY_PTYS

    OpenVZ Bug #52.


    Patch from Dmitry, idea of Solar Designer:

    CONFIG_SECURITY and CONFIG_VE are excludable options, since LSMs may break VZ security model. So made it excludable in Kconfig.


    Patch from Pavel:

    This patch adds necessary charging of memory in loading elf binaries for both ia64 and ia32 emulation.


    Patch from Pavel:

    On ia64 space right after struct thread_info is used to store registers. Quota overwrote these fields to store its own magic and inode pointer. Now theses values are stored right on task_struct in normal way.


    Patch from mainstream, prepared by Kirill:

    Adds ioperm annotations required for new drivers.


    Patch from Pavel:

    required kernel subsystems update for following libata and megaraid updates

    Bug 52529.
    Bug 52530.


    Patch from Pavel:

    megaraid driver is updated to 2.20 version

    Bug 52530.


    Patch from Pavel:

    libata updated to 1.11 version

    Bug 52529.


    Patch from mainstream:

    2004/09/12 10:30:42-07:00

    Stricter PCI IO space type checking uncovered a bug in sx8 driver. Forgot to add in the mmio base..