Open main menu

OpenVZ Virtuozzo Containers Wiki β

Changes

Download/kernel/rhel5/028stab027.1/changes

16,223 bytes added, 02:07, 21 March 2008
created
== Changes ==
* Fixes/improvements in checkpointing, NFS in VE, IOPRIO, CPU scheduler
* NMI watchdog is now disabled by default for i686 kernels.
* Attansic L1 Gigabit Ethernet driver added.

=== Config changes ===
Removed:
* -<code>CONFIG_NMI_WATCHDOG=y</code> (i686 only)

Added:
* +<code>CONFIG_ATL1=m</code>
* +<code>CONFIG_FUSE_FS=m</code>
<includeonly>[[{{PAGENAME}}/changes#Patches|{{Long changelog message}}]]</includeonly><noinclude>
=== Patches ===

==== diff-cpt-restore-route-bug-20070321 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
[CPT] bug in restore net routes

When netroute section in dump is padded, restore tries
to interpret padding as the next rtnetlink message and deadlocks
interpreting it as message of zero length.
</div>

==== diff-ms-oom-score-badness-20070322 ====
<div class="change">
Patch from Alexey Dobriyan &lt;adobriyan@openvz.org&gt;:<br/>
[PATCH] Fix unlocked access to task list from /proc/pid/oom_score

Failing code was prefetch hidden in

<source lang="c">
list_for_each_entry(child, &p->children, sibling) {
</source>
in badness(). badness() is reachable from two points.
One is proc_oom_score, another is
out_of_memory() =&gt; oom_select_bad_process() =&gt; badness().

Second path grabs tasklist_lock, while first doesn't.
</div>

==== diff-ubc-ioprio-queuelock-20070322 ====
<div class="change">
Patch from Alexey Dobriyan &lt;adobriyan@openvz.org&gt;:<br/>
[IOPRIO] dereference after free

save queue pointer in order not to dereference freed cfq_bc structure.
</div>

==== diff-ve-specialpids-20070322 ====
<div class="change">
Patch from Denis Lunev &lt;den@openvz.org&gt;:<br/>
Removes warning about special pids (from NFS kernel thread spawning).

{{Bug|470}}.<br/>
Bug #77832.
</div>

==== diff-ms-ptrace-bug-20070319 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
Replaced with version from Roland McGrath
</div>

==== diff-cpt-dump-ipv6-addr-fix-20070323 ====
<div class="change">
Patch from Andrey Mirkin &lt;major@openvz.org&gt;:<br/>
[CPT] Fix IPv6 addresses restore

All IPv6 addresses based on MAC are created with valid lifetime 0.
We checkpoint them and try to restore, but fail as inet6_addr_add()
returns -EINVAL if valid_lft is zero.

We can use ifaddr flags to find correct values for prefered and
valid life times.

TODO:<br/>
Kernel creates automatically local ipv6 address based on MAC address on it
when interface is upped. We can manually remove this address.
So, if we want to be sure that VE will have exactly the same set of addresses
after restore we should remove all IPs and after that add all IPs from dump.
</div>

==== diff-cpt-ubc-adjust-on-restore-b-20070323 ====
<div class="change">
Patch from Andrey Mirkin &lt;major@openvz.org&gt;:<br/>

[CPT] unlimit dcachesize on restore

Recently we have added adjusting of 3 limits on restore
to not fail because of hitting limits.
Now we have to add another one - dcachesize.

Bug #77889.<br/>
Bug #77890.<br/>
Bug #77896.
</div>

==== diff-fairsched-hot-vcpu-20070330 ====
<div class="change">
Patch from Alexandr Andreev &lt;aandreev@openvz.org&gt;:<br/>
[SCHED] Improve vcpu scheduling taking into account cache hotness

In original VZ kernel schedule_vcpu() takes next VCPU from
vsched-&gt;active list, and it doesn't take in to account vcpu-&gt;last_pcpu,
so VCPU's can jump from PCPU to PCPU too often.

Try to skip 'hot' VCPU's, i.e. VCPU's that were running on some
other PCPU recently.
Time slice threshold is tunable via /proc/sys/kernel/vcpu_hot_timeslice
</div>

==== diff-fairsched-idlebalance-20070328 ====
<div class="change">
Patch from Alexandr Andreev &lt;aandreev@openvz.org&gt;:<br/>
[SCHED] Improve idle load balance

Idle balance is called from an idle thread on rebalance_tick().
load_balance() tries to find busiest group in idle_vsched,
where there are no really running tasks.

With this patch, load_balance() will try to find a busiest vsched first,
and in case of success, then find busiest group inside this vsched, and
so on...
</div>

==== diff-fairsched-idlebalance-b-20070328 ====
<div class="change">
Patch from Kirill Korotaev &lt;dev@openvz.org&gt;:<br/>
[PATCH] Compilation fix fo idlebalance

Compilation fix for diff-fairsched-idlebalance-20070328
</div>

==== diff-ms-correct-accept-errh-20060326 ====
<div class="change">
Patch from Alexey Dobriyan &lt;adobriyan@openvz.org&gt;:<br/>
[PATCH] mainstream: fix sys_accept() error path

* d_alloc() in sock_attach_fd() fails leaving -&gt;f_dentry NULL
* bail out to out_fd label, which does fput()/__fput() on new file
* but __fput() assumes valid -&gt;f_dentry

Bug #77930.
</div>

==== diff-ms-ext3-xattr-refcount-b-20070326 ====
<div class="change">
Patch from Dmitry Monakhov &lt;dmonakhov@openvz.org&gt; from mainstream:<br/>
[EXT3] "ext[34]: EA block reference count racing fix" performance fix

From: Andrew Morton &lt;akpm@linux-foundation.org&gt;

A little mistake in 8a2bfdcbfa441d8b0e5cb9c9a7f45f77f80da465 is making all
transactions synchronous, which reduces ext3 performance to comical levels.

Cc: Mingming Cao &lt;cmm@us.ibm.com&gt;<br/>
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</div>

==== diff-ms-nmi-wdog-timeout-20070328 ====
<div class="change">
Patch from Kirill Korotaev &lt;dev@openvz.org&gt;:<br/>
[NMI] set default NMI watchdog timeout to 30 secs

Increase default NMI watchdog timeout to 30 seconds
as it was in 2.6.9
</div>

==== diff-ubc-ioprio-sleeping-context-20070328 ====
<div class="change">
Patch from Vasily Tarasov &lt;vtaras@openvz.org&gt;:<br/>
[IOPRIO] Call bc_findcreate_cfq_bc() out of q-&gt;queue_lock

Otherwise we may cause GFP_KERNEL allocation to happen
with a spinlock held.

Bug #78000.
</div>

==== diff-ubc-ioprio-sleeping-context-b-20070329 ====
<div class="change">
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
[IOPRIO] Call bc_findcreate_cfq_bc() out of q-&gt;queue_loc (fix 2)

Fix to fix for call bc_findcreate_cfq_bc() out of q-&gt;queue_lock.
iopriv should be initialized in both cases.
</div>

==== diff-ve-lockdep-fix-b-20060328 ====
<div class="change">
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
[LOCKDEP] Another fix for virtualized filesystems lockdep

As described before, filesystems in our kernels are
no longer static objects and thus lockdep refuses to
work. This was (wrongly) fixed by setting one static
class for all super block's semaphores and locks.

It turned out that different filesystems use different
lock ordering for sb locks and some other ones, e.g.
UDF may take inode-&gt;i_mutex under sb-&gt;s_lock, while
ext3 takes sb-&gt;s_lock under inode-&gt;i_mutex. This is
normal and doesn't create any deadlocks since super
blocks are different. But lockdep detects a circular
dependency in this case, as all super blocks are the
same for him.

This is solved by setting a class from filesystem type
on super block like it was before, but for virtualized
filesystems (e.g. procfs, devpts) the fs template is
used.

Bug #78110.
</div>

==== diff-ve-nfs-bindlock-20070327 ====
<div class="change">
Patch from Denis Lunev &lt;den@openvz.org&gt;:<br/>
[NFS] fix lockd context when bind mounted from VE0 to VE

This patch fixes NFS locking support over partitions
bind mounted to VE from VE0.
</div>

==== diff-ve-proc-moduleget-20070323 ====
<div class="change">
Patch from Konstantin Khorenko &lt;khorenko@openvz.org&gt;:<br/>
[PROC] mainstream: race between proc_lookup() and sys_delete_module()

Fix for the race between proc_lookup() and sys_delete_module():
proc_lookup() can find PDE under proc_subdir_lock,
on 2nd CPU sys_delete_module() removes pde and module,
then first CPU tries to get de and module in proc_get_inode()...
Bum...

Bug #77841.
</div>

==== diff-ve-stats-mm-opt-20070320 ====
<div class="change">
Patch from Alexandr Andreev &lt;aandreev@openvz.org&gt;:<br/>
[VESTATS] use jiffies instead of cycles for mm stats

use jiffies instead of cycles for mm stats about page allocation latency.

This implementation if very simple but it's strictly not that accurate,
because we can add 10 000 000 (or more) cycles (it's ~ 1 jiffy)
even if actual allocation consumes &lt; 10 000 cycles,
but jiffy has been changed at the moment.
</div>

==== diff-ubc-ioprio-cfqq-index-20070330 ====
<div class="change">
Patch from Evgeniy Kravtsunov &lt;emkravts@openvz.org&gt;:<br/>
[IOPRIO] Fix cfqq index calculation in async case

Field ioprio of task_struct consits of two numbers:<br/>

1) value of class (bits 14-16),<br/>
2) value of data (bits 0-13).<br/>
Value of data is allowed to belong the range [0, 7].

In current implementation of cfq_set_request tsk-&gt;ioprio is
used as index of *async_cfqq[8] array.

It is wrong because tsk-&gt;ioprio can be &gt;&gt; 8.

This can cause to either corruption or reading insufficient value:
<source lang="c">
cfq_set_request:
....
if (!cfq_bc-&gt;async_cfqq[tsk-&gt;ioprio]) {
cfqq = cfq_get_queue(cfqd, key, tsk, gfp_mask);
if (!cfqq)
goto queue_fail;

cfq_bc-&gt;async_cfqq[tsk-&gt;ioprio] = cfqq; &lt;&lt;&lt;corruption
} else
cfqq = cfq_bc-&gt;async_cfqq[tsk-&gt;ioprio]; &lt;&lt;&lt;wrong value
....
</source>

Correct index should be calculated from tsk-&gt;ioprio by using corresponding
functions and macros. Patch contains necessary updates.

Bug #78213.<br/>
probably fixes {{Bug|496}}.
</div>

==== linux-2.6.18-atl1-1.0.41.0.patch ====
<div class="change">
patch prepared by Roman (rchechnev@):<br/>
atl1 driver ver. 1.0.41.0 was ported in VZ kernel

this driver supports Attansic L1 gigabit ethernet cards.
sources were taken from: http://atl1.sourceforge.net/
</div>

==== diff-fairsched-cleanup-20070403 ====
<div class="change">
Patch from Alexandr Andreev &lt;aandreev@openvz.org&gt;:<br/>

[SCHED] small cleanup of code

Remove unnecessary argument this_pcpu (=== smp_processor_id())
from find_idle_target() and find_busiest_vsched()
</div>

==== diff-fairsched-idlebalance-c-20070402 ====
<div class="change">
Patch from Alexandr Andreev &lt;aandreev@openvz.org&gt;:<br/>

[SCHED] remove debug hunk from previous balance patch

My previous patch for load_balance() contains wrong condition
statement, that I forget to remove after debugging.

In 028stab025.1 load_balance() will not pull tasks from a busiest VCPU's,
if there are &lt; 2 tasks running on current VCPU. Attached patch removes
this incorrect check and fixes the problem.
</div>

==== diff-fairsched-idlebalance-d-20070402 ====
<div class="change">
Patch from Alexandr Andreev &lt;aandreev@openvz.org&gt;:<br/>
[SCHED] find_busiest_queue() should select VCPUs from given vsched only

In new scheme, we choose vsched in find_busiest_vsched(),
i.e. before find_busiest_queue(), so when we look
for busiest queue we must consider this vsched VCPU's only.

Bug #78385.<br/>
and maybe this:<br/>
Bug #78383.
</div>

==== diff-fairsched-use-vcpulastpcpu-20070403 ====
<div class="change">
Patch from Alexandr Andreev &lt;aandreev@openvz.org&gt;:<br/>
[SCHED] Cleanup: use vcpu_last_pcpu macro instead of vcpu-&gt;last_pcpu

Replace vcpu-&gt;last_pcpu by vcpu_last_pcpu(vcpu),
to fix compilation without CONFIG_VSCHED_VCPU
</div>

==== diff-rh-ia64-ptrace-pokedata-20070403 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
[IA64] strace -f does not work with utrace

The patch is submitted to roland@redhat.com with the following note:

ptrace implements -f flag catching clone() syscall and adjusting
clone flags to set CLONE_PTRACE. utrace patch breaks this.

Older ptrace used to simulate peek/poke to top of user RBS,
so that from user viewpoint registers stored in kernel RBS looked
like registers stored in user RBS.

utrace patch tried to improve this (to be honest, it does not look
as an improvement, but apparently author of those changes knows this
better). It forces _real_ writeback of kernel RBS to user space (why?).
The bug is that it never reads those registers back, so that
all the changes to this area of user RBS are lost.

One variant of fix is enclosed. Not quite self-consistent, because
the result of PTRACE_POKEDATA is never dumped back to real userspace.
But at least it works.
</div>

==== diff-ubc-ioprio-elv-switch-fix-20070403 ====
<div class="change">
Patch from Vasily Tarasov &lt;vtaras@openvz.org&gt;:<br/>
[IOPRIO] elevator switch oops fix

When elevator switch happens and UBs persist, putting of async cfqq can
happen second time due to non-NULL value in array.

{{Bug|526}}.
</div>

==== diff-ubc-ioprio-new-putqueue-20070402 ====
<div class="change">
Patch from Vasily Tarasov &lt;vtaras@openvz.org&gt;:<br/>
[IOPRIO] new cfq queue putting mechanism

It's better to use original cfqq put function from CFQ then rewrite it.
Use elevator_ops structure for exporting it.

Bug #78358.
</div>

==== diff-fairsched-cpuof-20070405 ====
<div class="change">
Patch from Alexandr Andreev &lt;aandreev@openvz.org&gt;:<br/>
[SCHED] Fix for cpu_of()

In new scheme, i.e. when physical cpu mask is used whenever it's possible
(in find_busiest_vsched(), find_busiest_queue() and so on)
cpu_of() must also return physical cpu id for given vcpu.

We have to use virtual id's (vcpu-&gt;id) only for vsched maps and for
process cpus allowed mask. In all other cases we need to use physical
masks to account physical CPU's topology.

Bug #78679.<br/>
Bug #78676.
</div>

==== diff-fairsched-del-vcpu-20070404 ====
<div class="change">
Patch from Alexandr Andreev &lt;aandreev@openvz.org&gt;:<br/>
[SCHED] VCPU should be initialized completely before deletion

There is a race in vsched_del_vcpu() - we can kill
migration_thread() even if it has not started yet, i.e.
migration_thread() function is not called at all. So,
migrate_live_tasks() and migrate_dead_tasks() will not be called on this
vcpu while migration thread is killed. But there can be some tasks,
that have already migrated on this vcpu, because this vcpu is already
marked as online.

This bug can be easily reproduced. On a busy host with many running
tasks user can run:

<pre>
# vzctl set NODE --cpus 1
# vzctl set NODE --cpus 4
# vzctl set NODE --cpus 1
</pre>

In this case, after the second vzctl, migration thread on VCPU 2 will be
created and just waked up, but it can be not really started (scheduled)
yet if there are a lot of other more priority tasks running on the host.
If it will not be scheduled before the third vzctl call, there will be
kernel bug in vsched_del_vcpu():

<source lang="c">
...
/*
* also, since this moment VCPU is offline, so migration_thread
* won't accept any new tasks...
*/
vmigration_call(&amp;migration_notifier, CPU_DEAD, vcpu);
BUG_ON(rq-&gt;nr_running != 0);
...
</source>

Bug #78487.
</div>

==== diff-fairsched-findbusiesgroup-20070405 ====

<div class="change">
Patch from Alexandr Andreev &lt;aandreev@openvz.org&gt;:<br/>
[SCHED] find_busiest_group() should use pcpu mask

VCPUs should be skipped according to pcpu mask
</div>

==== diff-rh-utrace-sighand-20070405 ====
<div class="change">
Patch from Denis Lunev &lt;den@openvz.org&gt;:<br/>
This patch fixes unattended use of parent-&gt;sighand.

It should be:
* guarded with tasklist_lock
* checked for NULL inside the lock

Bug #78657.
</div>

==== diff-ubc-ioprio-compilation-fix-20070304 ====
<div class="change">
Patch from Vasily Tarasov &lt;vtaras@openvz.org&gt;:<br/>
[IOPRIO] compilation fix in case UBC_IO_ACCT is off

Compilation fix in case UBC_IO_ACCT is off.

{{Bug|527}}.
</div>

==== diff-ubc-nonbc-caches-20070404 ====
<div class="change">
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
[BC] Don't make pre-created INDEX_AC and INDEX_L3 caches UBC

This made size-32 and size-64 caches on i386 be the same
capacity as size-X(UBC) ones.
</div>

==== diff-ubc-refcount-leak-20070404 ====
<div class="change">
Patch from Andrey Mirkin &lt;major@openvz.org&gt;:<br/>
[BC] Fix potential beancounter refcount leak

On some error paths we forget to put beancounter.
This patch fixes two such places:
* sys_setluid()
* bc_entry_open()

Bug #77231.
</div>

</noinclude>