Open main menu

OpenVZ Virtuozzo Containers Wiki β

Changes

Download/kernel/rhel4/023stab044.4/changes

16,451 bytes added, 09:56, 20 March 2008
created
== Changes ==
* Rebased on 2.6.9-55.el4 (RHEL4u5)
* Backported some patches from 2.6.18 (fairsched optimizations, fixes in vzquota, smbfs etc).
* Lots of driver updates.

=== Config changes ===
Same as {{Kernel link|rhel4|023stab043.2}}, plus:

* +<code>CONFIG_ATA=y</code> (sata drivers update up to 2.0 version)
* +<code>CONFIG_R8169_NAPI=y</code>
* +<code>CONFIG_QLA3XXX=y</code>
* +<code>CONFIG_SENSORS_SMSC47B397=m</code>
* +<code>CONFIG_EDAC_K8=m</code>
<includeonly>[[{{PAGENAME}}/changes#Updated drivers|{{Long changelog message}}]]</includeonly><noinclude>
=== Updated drivers ===
* The Intelligent Input/Output (I2O) layer (memory leaks, infinite loop fix, controller's message frame leak)
* Areca RAID Controller driver (arcmsr driver 1.20.0X.13-61107 version)
* 3ware 9000 Storage Controller driver (3w-9xxx driver 2.26.05.007 version)
* Qlogic 21xx/22xx/2300/2312/2322/6312/6322 Host Adapter driver (qla2xxx drivers 8.01.04-d8-rh1 version)
* LSI Logic MegaRAID SAS RAID driver (megaraid_sas driver 00.00.03.05)
* LSI Logic Management Module (megaraid_mm driver 2.20.2.6rh version)
* LSI Logic Fusion MPT driver (mptbase driver 3.02.73rh version)
* Compaq Smart Array 5xxx Controller driver (cciss driver 2.6.14 version)
* SuperTrak EX8350/8300/16350/16300 Storage Controller driver (stex driver 3.0.0.1 version)
* Intel PIIX/ICH SATA Controller driver (ata_piix driver 2.00ac7 version)
* AHCI SATA driver (ahci driver 2.0 version)
* ServerWorks Frodo / Apple K2 SATA Controller driver (sata_svw driver 2.0 version)
* Marvell SATA Controller driver (sata_mv driver 0.7 version)
* NVIDIA SATA Controller driver (sata_nv driver 3.2 version)
* Pacific Digital ADMA Controller driver (pdc_adma driver 0.04 version)
* Pacific Digital SATA QStor Controller driver (sata_qstor driver 0.06 version)
* Promise SATA TX2/TX4 Controller driver (sata_promise driver 1.05 version)
* Promise SATA SX4 Controller driver (sata_sx4 driver 0.9 version)
* Silicon Image SATA Controller driver (sata_sil driver 2.0 version)
* SiS 964/180 SATA Controller driver (sata_sis driver 0.6 version)
* ULi Electronics SATA Controller driver (sata_uli driver 1.0 version)
* VIA SATA Controller driver (sata_via driver 2.0 version)
* VITESSE VSC-7174 / INTEL 31244 SATA Controller driver (sata_vsc driver 2.0 version)
* Libata drivers (2.00 version)
* QLogic QLA3XXX Network Driver (qla3xxx driver 2.02.00-k37RH4U5 version)
* Intel(R) PRO/1000 Network driver (e1000 driver 7.2.7-k2-NAPI version)
* Marvell Yukon 2 Gigabit Ethernet driver (sky2 driver 1.6 version)
* Broadcom Tigon3 Ethernet driver (tg3 driver 3.64-rh version)
* Broadcom NX2 Ethernet driver (bnx2 driver 1.4.43-rh version)
* RealTek RTL8169s/8110s Gigabit Ethernet driver (r8169 driver 2.2LK-NAPI version)
* Intel(R) PRO/10GbE Ethernet driver (ixgb driver 1.0.109-k2-NAPI version)

==== Patches ====
<dl>
==== diff-cpt-restore-route-bug-20070321 ====
<div class="change">
Patch from Alexey:<br/>
[CPT] bug in restore net routes

When netroute section in dump is padded, restore tries
to interpret padding as the next rtnetlink message and deadlocks
interpreting it as message of zero length.
</div>

==== diff-cpt-ubc-adjust-on-restore-b-20070323 ====
<div class="change">
Patch from Andrey:

Recently we have added adjusting of 3 limits on restore to not fail
because of hitting limits. Now we have to add another one - dcachesize.

Bug #77889.<br/>
Bug #77890.<br/>
Bug #77896.
</div>

==== diff-fairsched-best-vcpu-20070413 ====
<div class="change">
Patch from Alexandr Andreev:<br/>
[SCHED] Reduce starvation of some VCPUs in case of cpu limits

Change logic of choosing best_vcpu to schedule to.
There are two potential problems:

a) if a vcpu is hot, and last used physical CPU of this vcpu is equal to
smp_processor_id() it will be always chosen. This is not a good
decision, because there is no guarantee, that _all_ physical CPU's must
take vcpu's from a vsched. For example, if cpulimit for a vsched is
small, this vsched can be run only on one physical CPU forever.

b) Also now newer 'cold' vcpu's are chosen first,
because we scan active_list in direct way,
i.e. from older vcpus to newer vcpus, and a newer one will be chosen finally.
In this case old vcpu's can starve for a long time

Bug #79015.
</div>

==== diff-fairsched-best-vcpu-b-20070423 ====
<div class="change">
Patch from Alexandr Andreev:<br/>
[SCHED] Select some vcpu instead of idle even if all vcpus are hot

We have to use oldest vcpu if all vcpu's are hot.
In current kernel an idle_vcpu is used and CPU can idle instead of
doing some job.

Bug #79676.
</div>

==== diff-fairsched-comp-fixes-20070503 ====
<div class="change">
some compilation fixes after rebasing to -55 EL kernel
</div>

==== diff-fairsched-hot-vcpu-20070409 ====
<div class="change">
Patch from Alexandr:<br/>
[SCHED] Hot VCPU's - optimization for schedule_vcpu(). (from 2.6.18).

In original VZ kernel schedule_vcpu() chooses next VCPU from
vsched-&gt;active list, and it doesn't take into account vcpu-&gt;last_pcpu,
so VCPU's can jump from PCPU to PCPU too often.

With this patch, schedule_vcpu() tries to skip 'hot' VCPU's, i.e. VCPU's
that were running on some other PCPU recently. Time slice threshold is
tunable and can be set via /proc/sys/kernel/vcpu_hot_timeslice (like
vcpu_timeslice).

</div>

==== diff-fairsched-idlebalance-20070409 ====
<div class="change">
Patch from Alexandr:<br/>
[SCHED] Optimization for load_balance() (backported from 2.6.18).

Description: load_balance() in 2.6.9 and 2.6.18 is broken in case of
it's called from an idle thread on rebalance_tick(). This is because
load_balance() tries to find busiest group in idle_vsched (!), where
there are no really running tasks at all.

With this patch, load_balance() will try to find a busiest vsched first,
and in case of success, then find busiest group inside this vsched, and
so on.

</div>

==== diff-fairsched-idlebalance-d-20070409 ====
<div class="change">
Patch from Alexandr Andreev:<br/>
[SCHED] find_busiest_queue() should select VCPUs from given vsched only

In new scheme, we choose vsched in find_busiest_vsched(),
i.e. before find_busiest_queue(), so when we look
for busiest queue we must consider this vsched VCPU's only.

Bug #78385.<br/>
and maybe this:<br/>
Bug #78383.

</div>

==== diff-ms-elf-retval-20070420 ====
<div class="change">
Patch from Alexey Kuznetsov:<br/>
[PATCH] Invalid return value of execve() resulting in oopses (mainstream)

Invalid return value of execve() resulting in oopses (mainstream)

When elf loader fails to map executable (due to memory shortage
or because binary is malformed), it can return 0. Normally, this is invisible
because process is killed with SIGKILL and it never returns to user space.

But if exec() is called from kernel thread (hotplug, whatever) consequences
are more interesting and vary depending on architecture.

<ul>
<li>i386. Nothing especially interesting, execve() just returns with
"success"</li>

<li>x86_64. Fake zero frame is used on way to caller, RSP/RIP are loaded
with zeros, ergo... double fault.</li>

<li>ia64. Similar to i386, but r32...r95 are corrupted. Sometimes it oopses
due to return to zero PC, sometimes it sees NaT in rXX and oopses
due to NaT consumption.</li>
</ul>

This fix solves bugs #68582 (i386), #73753 (x86_64) and #79847 (ia64).
</div>

==== diff-ms-emt64-dblfault-debug-b-20070402 ====
<div class="change">
Patch from Vasily:<br/>
removes the extra debug messages from segment stack exception handler

Bug #78401.

</div>

==== diff-ms-ext3-orphan-dbg-20070319 ====
<div class="change">
Patch from Vasily:<br/>
adds debug for search ext3 orphan list corruptions

Bug #77466.
</div>

==== diff-ms-i386-ioapic-compilation-20070319 ====
<div class="change">
Patch from Evgeny:

Patch fixes compilation error. Symbols disable_timer_pin_1 and check_ioapic
that are defined only when CONFIG_X86_IO_APIC is set, should be placed under
ifdef CONFIG_X86_IO_APIC in parse_cmdline_early and setup_arch functions
(arch/i386/kernel/setup.c).

[http://bugzilla.openvz.org/show_bug.cgi?id=479 OpenVZ Bug #479].
</div>

==== diff-ms-smbfs-corruption-20070328 ====

<div class="change">
Patch from mainstream:<br/>
[PATCH] smbfs: Fix slab corruption in samba error path

[http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=48564e628bd7662d7a0b3ac81c41cd0e4cc36dae GIT: 48564e628bd7662d7a0b3ac81c41cd0e4cc36dae]

Bug #78157.
</div>

==== diff-ms-smbfs-doublefree-20070313 ====

<div class="change">
Patch from Vasily:<br/>
Fixes rq_trans2buffer double free issue in smbfs.

smbfs allocates rq_trans2buffer to handle server's multi transaction2 response
messages. Because of smb_request may be reused, rq_trans2buffer frees before
each new request. However if last servers's response is not multi but single
trans2 message then new rq_trans2buffer is not allocated and smb_rput tries
to free this buffer twice.

To prevent this issue rq_trans2buffer pointer should be set to NULL after kfree.

Bug #74499.

PS. Issue is still present in mainstream, however I would note that smbfs
support was dropped in latest distributions (FC6 and RHEL5), replaced by CIFS.
</div>

==== diff-ms-smbfs-fmode-20070328 ====
<div class="change">
Patch from mainstream:<br/>
[SMBFS] set the default files and dir mask for smbfs

it's missed hunk from the following patch:[http://git.kernel.org/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commitdiff;h=99ca50fb7eb44cefb138fc8b885ce6be411bcd51 GIT: 99ca50fb7eb44cefb138fc8b885ce6be411bcd51].

Bug #78157.
</div>

==== diff-ms-stopmachine-msleep-20070411 ====
<div class="change">
Patch from Kostja:<br/>
a possible live-lock in stop_machine():
<ul>
<li>stopmachine_state == STOPMACHINE_WAIT;</li>

<li>STOPPER (stop_machine()) is in state SM_STOPPER_WAITING,
calling yield() in a loop;</li>
<li>SLAVES (stopmachine()) also call yield() in a loop.</li>
</ul>

This leads to the fairsched_lock suffering on all CPUs and in case of unfair
getting lock rules (for example on NUMA node), some CPUs can wait for the lock
forever/for a long time, causing a hang of the node.
This patch replaces yield() by msleep(10).

Bug #78975.
</div>

==== diff-rh-execshield-randomize-va-20070326 ====

<div class="change">
Patch from Vasily:<br/>
[PATCH] change stack randomize range in rhel4-based kernels

exec-shield-randomize feature in rhel4 confuses UBC a bit: for example,
typical VE in 2.6.8 owns 17Mb of privvmpages, but the same VE on 2.6.9
kernel owns 40Mb of privvmpages. The stack randomization is guilty: it
adds no more than 2Mb to each process. In rhel5-based kernels the
randomize range is much less: only two pages. So, just use the same
approach in rhel4.

[http://bugzilla.openvz.org/show_bug.cgi?id=484 OpenVZ Bug #484].
</div>

==== diff-rh-revert-proc-readdir ====

<div class="change">
Revert linux-2.6.9-proc-readdir.patch from RHEL4u5.
This optimization requires pidmap virtualization,
which is fixed in 2.6.18-OVZ, but it's painful to backport it.
</div>

==== diff-sysrqkey-scancode-b-20070319 ====
<div class="change">
Patch from Evgeny:<br/>
Patch fixes compilation error in emulate_raw() function when
CONFIG_MAGIC_SYSRQ is not set.

sysrq_key_scancode moved outside #ifdef CONFIG_MAGIC_SYSRQ.

[http://bugzilla.openvz.org/show_bug.cgi?id=479 OpenVZ Bug #479].
</div>

==== diff-ve-meminfo-b-20070327 ====
<div class="change">
Patch from Kostja:<br/>
Adds sysctl to choose base ubc for memory usage inside a VE.

Sets PRIVVMPAGES beancounter to be used by default instead of OOMGUARPAGES.

Bug #78088.
</div>

==== diff-ve-proc-moduleget-20070323 ====
<div class="change">
Patch from Kostja:<br/>
Fixes the race between proc_lookup() and sys_delete_module()

Bug #77841.
</div>

==== diff-ve-proc-moduleget-b-20070323 ====
<div class="change">
Patch from Kostja:<br/>
Fixes the race between proc_lookup() and remove_proc_entry().

As far as i can understand there is a race:
<pre>
proc_lookup()
{
lock_kernel();
lde = LPDE(dir);
if (!lde)
goto out;

/* here lde can be destroyed??? */

spin_lock(&amp;proc_subdir_lock);
lde = __proc_lookup(lde, dentry);
</pre>
</div>

==== diff-ve-tun-persist-20070424 ====
<div class="change">
Patch from Vasily:<br/>
prohibit tun persistent mode inside VE

Bug #79612.
</div>

==== diff-ve-venet-loopstat-20070314 ====

<div class="change">
Patch from Denis:<br/>
This patch frees leaked loopback per/cpu stats.
</div>

==== diff-vzdq-ugbad-20070428 ====
<div class="change">
Patch from Konstantin Khorenko &lt;khorenko@openvz.org&gt;:<br/>
[VZDQ] prohibit chown of a file if owner doesn't have ugid struct

Prohibit chown a file if its owner does not have
ugid record. This might happen if we somehow exceeded
the UID/GID (e.g. set ugidlimit less than number of users).

Bug #79553.
</div>

==== diff-i2o-msgleak-10070423 ====
<div class="change">
Patch from Vasiliy:<br/>
This patch fixes i2o message leak.

We need to free msg itself and i2o message in hw in case
of error.
</div>

==== diff-i2o-msgget-errh-10070423 ====
<div class="change">
Patch from Vasiliy:<br/>
This patch fixes access to memory that has not been allocated:

i2o_msg_get_wait() can returns errors different from I2O_QUEUE_EMPTY.
But the result is checked only against this code.
if it is not I2O_QUEUE_EMPTY then we dereference the error code as the pointer later.
</div>

==== diff-i2o-cfg-passthru-20070423 ====
<div class="change">
Patch from Vasiliy:<br/>
This patch fixes a number of issues in i2o_cfg_passthru{,32}:
<ul>
<li>memory leaks (including i2o_message leak fixed by khorenko@)</li>
<li>infinite loop to sg_list_cleanup in passthru32</li>
<li>bad error paths</li>
</ul>

</div>

==== diff-i2o-proc-perms-20060304 ====
<div class="change">
Patch from Vasiliy:

Reading from some i2o related proc files can lead to the controller hang due
unknown reasons. As a workaround this patch changes the permission of these
files to root-only accessible.
</div>

==== diff-md-bio-init-20070504 ====
<div class="change">

Patch from Kirill:

Fix compilation of ./drivers/md/dm-biosets.c when CONFIG_DM=y
bio_init is re-defined in this file. It works ok when DM is compiled
as module, since local bio_init version is used. However, when compiled-in
this conflicts with function defined in fs/bio.c
</div>

==== linux-2.6.9-arcmsr-1.20.0X.13-61107.patch ====
<div class="change">
patch ported by Kostja (khorenko@),<br/>
Areca driver v1.20.0X.13-61107 added.

Sources from [ftp://ftp.areca.com.tw/RaidCards/AP_Drivers/Linux/DRIVER/SourceCode/ Areca site].

Bug #59933.
</div>

==== linux-2.6.9-qla2xxx-8.01.05.patch ====
<div class="change">
remake by Kostja (khorenko@) for 2.6.9-55.EL kernels, includes
diff-qla2xxx-makefile-20070319:<br/>
Patch from Evgeny:<br/>
Patch fixes compilation error by updating makefile

drivers/scsi/qla2xxx/Makefile.
When CONFIG_SCSI_QLA2XXX_FAILOVER is not set failover compilation fails,
since Makefile defines it, but autoconf.h undefines.

{{bug|479}}.
</div>

==== diff-i2o-procread-20070510 ====
<div class="change">
Patch from Vasily:

fixed oops on reading from some i2o proc files because their handlers uses
"exec" field in struct i2o_controller,
really it's minor issue because i2o_proc module is not loaded by default.
</div>

==== diff-smbfs-mount-compat-20070511 ====
<div class="change">
Patch from Andrey:<br/>
fixes security issue, oops triggered by any user inside VE:<br/>
[PATCH] skip data conversion in compat_sys_mount when data_page is NULL

OpenVZ Linux kernel team has found a problem with mounting in compat mode.

Simple command "mount -t smbfs ..." on Fedora Core 5 distro in 32-bit mode
leads to oops:
<pre>
Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
compat_sys_mount+0xd6/0x290
Process mount (pid: 14656, veid=300, threadinfo ffff810034d30000, task
ffff810034c86bc0)
Call Trace: ia32_sysret+0x0/0xa
</pre>

The problem is that data_page pointer can be NULL, so we should skip data
conversion in this case.

Bug #81559.
</div>

==== diff-fs-inode-alloc-sem-fix ====
<div class="change">
Patch from Andrey (amirkin@):

In new RHEL4 kernel (2.6.9-42.0.8.EL) we have following changes - i_alloc_sem
is taken now in notify_change() if we are changing size:

<pre>
if (ia_valid &amp; ATTR_SIZE)
down_write(&amp;dentry-&gt;d_inode-&gt;i_alloc_sem);
</pre>

So, we do not need to take this semaphore before notify_change(), when we are
going to change size.

Bug #81426.
</div>
</noinclude>