From OpenVZ Virtuozzo Containers Wiki
< Download‎ | kernel‎ | rhel4‎ | 023stab044.4
Revision as of 09:56, 20 March 2008 by Kir (talk | contribs) (created)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


  • Rebased on 2.6.9-55.el4 (RHEL4u5)
  • Backported some patches from 2.6.18 (fairsched optimizations, fixes in vzquota, smbfs etc).
  • Lots of driver updates.

Config changes

Same as 023stab043.2, plus:

  • +CONFIG_ATA=y (sata drivers update up to 2.0 version)
  • +CONFIG_R8169_NAPI=y

Updated drivers

  • The Intelligent Input/Output (I2O) layer (memory leaks, infinite loop fix, controller's message frame leak)
  • Areca RAID Controller driver (arcmsr driver 1.20.0X.13-61107 version)
  • 3ware 9000 Storage Controller driver (3w-9xxx driver version)
  • Qlogic 21xx/22xx/2300/2312/2322/6312/6322 Host Adapter driver (qla2xxx drivers 8.01.04-d8-rh1 version)
  • LSI Logic MegaRAID SAS RAID driver (megaraid_sas driver
  • LSI Logic Management Module (megaraid_mm driver version)
  • LSI Logic Fusion MPT driver (mptbase driver 3.02.73rh version)
  • Compaq Smart Array 5xxx Controller driver (cciss driver 2.6.14 version)
  • SuperTrak EX8350/8300/16350/16300 Storage Controller driver (stex driver version)
  • Intel PIIX/ICH SATA Controller driver (ata_piix driver 2.00ac7 version)
  • AHCI SATA driver (ahci driver 2.0 version)
  • ServerWorks Frodo / Apple K2 SATA Controller driver (sata_svw driver 2.0 version)
  • Marvell SATA Controller driver (sata_mv driver 0.7 version)
  • NVIDIA SATA Controller driver (sata_nv driver 3.2 version)
  • Pacific Digital ADMA Controller driver (pdc_adma driver 0.04 version)
  • Pacific Digital SATA QStor Controller driver (sata_qstor driver 0.06 version)
  • Promise SATA TX2/TX4 Controller driver (sata_promise driver 1.05 version)
  • Promise SATA SX4 Controller driver (sata_sx4 driver 0.9 version)
  • Silicon Image SATA Controller driver (sata_sil driver 2.0 version)
  • SiS 964/180 SATA Controller driver (sata_sis driver 0.6 version)
  • ULi Electronics SATA Controller driver (sata_uli driver 1.0 version)
  • VIA SATA Controller driver (sata_via driver 2.0 version)
  • VITESSE VSC-7174 / INTEL 31244 SATA Controller driver (sata_vsc driver 2.0 version)
  • Libata drivers (2.00 version)
  • QLogic QLA3XXX Network Driver (qla3xxx driver 2.02.00-k37RH4U5 version)
  • Intel(R) PRO/1000 Network driver (e1000 driver 7.2.7-k2-NAPI version)
  • Marvell Yukon 2 Gigabit Ethernet driver (sky2 driver 1.6 version)
  • Broadcom Tigon3 Ethernet driver (tg3 driver 3.64-rh version)
  • Broadcom NX2 Ethernet driver (bnx2 driver 1.4.43-rh version)
  • RealTek RTL8169s/8110s Gigabit Ethernet driver (r8169 driver 2.2LK-NAPI version)
  • Intel(R) PRO/10GbE Ethernet driver (ixgb driver 1.0.109-k2-NAPI version)



Patch from Alexey:
[CPT] bug in restore net routes

When netroute section in dump is padded, restore tries to interpret padding as the next rtnetlink message and deadlocks interpreting it as message of zero length.


Patch from Andrey:

Recently we have added adjusting of 3 limits on restore to not fail because of hitting limits. Now we have to add another one - dcachesize.

Bug #77889.
Bug #77890.
Bug #77896.


Patch from Alexandr Andreev:
[SCHED] Reduce starvation of some VCPUs in case of cpu limits

Change logic of choosing best_vcpu to schedule to. There are two potential problems:

a) if a vcpu is hot, and last used physical CPU of this vcpu is equal to smp_processor_id() it will be always chosen. This is not a good decision, because there is no guarantee, that _all_ physical CPU's must take vcpu's from a vsched. For example, if cpulimit for a vsched is small, this vsched can be run only on one physical CPU forever.

b) Also now newer 'cold' vcpu's are chosen first, because we scan active_list in direct way, i.e. from older vcpus to newer vcpus, and a newer one will be chosen finally. In this case old vcpu's can starve for a long time

Bug #79015.


Patch from Alexandr Andreev:
[SCHED] Select some vcpu instead of idle even if all vcpus are hot

We have to use oldest vcpu if all vcpu's are hot. In current kernel an idle_vcpu is used and CPU can idle instead of doing some job.

Bug #79676.


some compilation fixes after rebasing to -55 EL kernel


Patch from Alexandr:
[SCHED] Hot VCPU's - optimization for schedule_vcpu(). (from 2.6.18).

In original VZ kernel schedule_vcpu() chooses next VCPU from vsched->active list, and it doesn't take into account vcpu->last_pcpu, so VCPU's can jump from PCPU to PCPU too often.

With this patch, schedule_vcpu() tries to skip 'hot' VCPU's, i.e. VCPU's that were running on some other PCPU recently. Time slice threshold is tunable and can be set via /proc/sys/kernel/vcpu_hot_timeslice (like vcpu_timeslice).


Patch from Alexandr:
[SCHED] Optimization for load_balance() (backported from 2.6.18).

Description: load_balance() in 2.6.9 and 2.6.18 is broken in case of it's called from an idle thread on rebalance_tick(). This is because load_balance() tries to find busiest group in idle_vsched (!), where there are no really running tasks at all.

With this patch, load_balance() will try to find a busiest vsched first, and in case of success, then find busiest group inside this vsched, and so on.


Patch from Alexandr Andreev:
[SCHED] find_busiest_queue() should select VCPUs from given vsched only

In new scheme, we choose vsched in find_busiest_vsched(), i.e. before find_busiest_queue(), so when we look for busiest queue we must consider this vsched VCPU's only.

Bug #78385.
and maybe this:
Bug #78383.


Patch from Alexey Kuznetsov:
[PATCH] Invalid return value of execve() resulting in oopses (mainstream)

Invalid return value of execve() resulting in oopses (mainstream)

When elf loader fails to map executable (due to memory shortage or because binary is malformed), it can return 0. Normally, this is invisible because process is killed with SIGKILL and it never returns to user space.

But if exec() is called from kernel thread (hotplug, whatever) consequences are more interesting and vary depending on architecture.

  • i386. Nothing especially interesting, execve() just returns with "success"
  • x86_64. Fake zero frame is used on way to caller, RSP/RIP are loaded with zeros, ergo... double fault.
  • ia64. Similar to i386, but r32...r95 are corrupted. Sometimes it oopses due to return to zero PC, sometimes it sees NaT in rXX and oopses due to NaT consumption.

This fix solves bugs #68582 (i386), #73753 (x86_64) and #79847 (ia64).


Patch from Vasily:
removes the extra debug messages from segment stack exception handler

Bug #78401.


Patch from Vasily:
adds debug for search ext3 orphan list corruptions

Bug #77466.


Patch from Evgeny:

Patch fixes compilation error. Symbols disable_timer_pin_1 and check_ioapic that are defined only when CONFIG_X86_IO_APIC is set, should be placed under ifdef CONFIG_X86_IO_APIC in parse_cmdline_early and setup_arch functions (arch/i386/kernel/setup.c).

OpenVZ Bug #479.


Patch from mainstream:
[PATCH] smbfs: Fix slab corruption in samba error path

GIT: 48564e628bd7662d7a0b3ac81c41cd0e4cc36dae

Bug #78157.


Patch from Vasily:
Fixes rq_trans2buffer double free issue in smbfs.

smbfs allocates rq_trans2buffer to handle server's multi transaction2 response messages. Because of smb_request may be reused, rq_trans2buffer frees before each new request. However if last servers's response is not multi but single trans2 message then new rq_trans2buffer is not allocated and smb_rput tries to free this buffer twice.

To prevent this issue rq_trans2buffer pointer should be set to NULL after kfree.

Bug #74499.

PS. Issue is still present in mainstream, however I would note that smbfs support was dropped in latest distributions (FC6 and RHEL5), replaced by CIFS.


Patch from mainstream:
[SMBFS] set the default files and dir mask for smbfs

it's missed hunk from the following patch:GIT: 99ca50fb7eb44cefb138fc8b885ce6be411bcd51.

Bug #78157.


Patch from Kostja:
a possible live-lock in stop_machine():

  • stopmachine_state == STOPMACHINE_WAIT;
  • STOPPER (stop_machine()) is in state SM_STOPPER_WAITING, calling yield() in a loop;
  • SLAVES (stopmachine()) also call yield() in a loop.

This leads to the fairsched_lock suffering on all CPUs and in case of unfair getting lock rules (for example on NUMA node), some CPUs can wait for the lock forever/for a long time, causing a hang of the node. This patch replaces yield() by msleep(10).

Bug #78975.


Patch from Vasily:
[PATCH] change stack randomize range in rhel4-based kernels

exec-shield-randomize feature in rhel4 confuses UBC a bit: for example, typical VE in 2.6.8 owns 17Mb of privvmpages, but the same VE on 2.6.9 kernel owns 40Mb of privvmpages. The stack randomization is guilty: it adds no more than 2Mb to each process. In rhel5-based kernels the randomize range is much less: only two pages. So, just use the same approach in rhel4.

OpenVZ Bug #484.


Revert linux-2.6.9-proc-readdir.patch from RHEL4u5. This optimization requires pidmap virtualization, which is fixed in 2.6.18-OVZ, but it's painful to backport it.


Patch from Evgeny:
Patch fixes compilation error in emulate_raw() function when CONFIG_MAGIC_SYSRQ is not set.

sysrq_key_scancode moved outside #ifdef CONFIG_MAGIC_SYSRQ.

OpenVZ Bug #479.


Patch from Kostja:
Adds sysctl to choose base ubc for memory usage inside a VE.

Sets PRIVVMPAGES beancounter to be used by default instead of OOMGUARPAGES.

Bug #78088.


Patch from Kostja:
Fixes the race between proc_lookup() and sys_delete_module()

Bug #77841.


Patch from Kostja:
Fixes the race between proc_lookup() and remove_proc_entry().

As far as i can understand there is a race:

	lde = LPDE(dir);
	if (!lde)
		goto out;

/* here lde can be destroyed??? */

	lde = __proc_lookup(lde, dentry);


Patch from Vasily:
prohibit tun persistent mode inside VE

Bug #79612.


Patch from Denis:
This patch frees leaked loopback per/cpu stats.


Patch from Konstantin Khorenko <>:
[VZDQ] prohibit chown of a file if owner doesn't have ugid struct

Prohibit chown a file if its owner does not have ugid record. This might happen if we somehow exceeded the UID/GID (e.g. set ugidlimit less than number of users).

Bug #79553.


Patch from Vasiliy:
This patch fixes i2o message leak.

We need to free msg itself and i2o message in hw in case of error.


Patch from Vasiliy:
This patch fixes access to memory that has not been allocated:

i2o_msg_get_wait() can returns errors different from I2O_QUEUE_EMPTY. But the result is checked only against this code. if it is not I2O_QUEUE_EMPTY then we dereference the error code as the pointer later.


Patch from Vasiliy:
This patch fixes a number of issues in i2o_cfg_passthru{,32}:

  • memory leaks (including i2o_message leak fixed by khorenko@)
  • infinite loop to sg_list_cleanup in passthru32
  • bad error paths


Patch from Vasiliy:

Reading from some i2o related proc files can lead to the controller hang due unknown reasons. As a workaround this patch changes the permission of these files to root-only accessible.


Patch from Kirill:

Fix compilation of ./drivers/md/dm-biosets.c when CONFIG_DM=y bio_init is re-defined in this file. It works ok when DM is compiled as module, since local bio_init version is used. However, when compiled-in this conflicts with function defined in fs/bio.c


patch ported by Kostja (khorenko@),
Areca driver v1.20.0X.13-61107 added.

Sources from Areca site.

Bug #59933.


remake by Kostja (khorenko@) for 2.6.9-55.EL kernels, includes diff-qla2xxx-makefile-20070319:
Patch from Evgeny:
Patch fixes compilation error by updating makefile

drivers/scsi/qla2xxx/Makefile. When CONFIG_SCSI_QLA2XXX_FAILOVER is not set failover compilation fails, since Makefile defines it, but autoconf.h undefines.

OpenVZ Bug #479.


Patch from Vasily:

fixed oops on reading from some i2o proc files because their handlers uses "exec" field in struct i2o_controller, really it's minor issue because i2o_proc module is not loaded by default.


Patch from Andrey:
fixes security issue, oops triggered by any user inside VE:
[PATCH] skip data conversion in compat_sys_mount when data_page is NULL

OpenVZ Linux kernel team has found a problem with mounting in compat mode.

Simple command "mount -t smbfs ..." on Fedora Core 5 distro in 32-bit mode leads to oops:

Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
Process mount (pid: 14656, veid=300, threadinfo ffff810034d30000, task
Call Trace: ia32_sysret+0x0/0xa

The problem is that data_page pointer can be NULL, so we should skip data conversion in this case.

Bug #81559.


Patch from Andrey (amirkin@):

In new RHEL4 kernel (2.6.9-42.0.8.EL) we have following changes - i_alloc_sem is taken now in notify_change() if we are changing size:

if (ia_valid & ATTR_SIZE)

So, we do not need to take this semaphore before notify_change(), when we are going to change size.

Bug #81426.