Open main menu

OpenVZ Virtuozzo Containers Wiki β

Changes

Download/kernel/2.6.16/026test014.4/changes

19,717 bytes added, 09:57, 21 March 2008
created
== Changes ==
* IPv6 virtualization
* Virtual ethernet device (veth)
* Checkpointing of skfilter, fixes in threads migration
* /proc/meminfo tuning from userspace
* Vzquota lockup fix
* UBC optimization, leak fixes.
* Mount operations restriction.
* Compilation fixes.
* Mainstream security fixes (up to 2.6.16.18).
* Some fixes ported from the stable OpenVZ kernel.

=== Config changes ===
Same as {{Kernel link|2.6.16|026test012.1}} plus:

* +<code>CONFIG_IPV6=y</code>
* +<code>CONFIG_NMI_WATCHDOG=y</code>
* +<code>CONFIG_QUOTA_COMPAT=y</code>
* +<code>CONFIG_VE_ETHDEV=m</code>
* +<code>CONFIG_BRIDGE=m</code>

{{Kernel git log|2.6.16|026test014.4}}

<includeonly>[[{{PAGENAME}}/changes#Patches|{{Long changelog message}}]]</includeonly><noinclude>
=== Patches ===

==== diff-cbq-fairness-20020927 ====
<div class="change">
Patch from OpenVZ team &lt;devel@openvz.org&gt;:<br/>
CBQ fairness fixes
* reapair cbq fairness in its first hank
* restrict cl-&gt;quantum in the second one
</div>

==== diff-cpt-emt64-pgoff-20060512 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
[CPT][X86_64] cpt_mm lost high bits of page offset

"int" was used to store page offset. It is not enough.
</div>

==== diff-cpt-export-pmd-huge-20060511 ====
<div class="change">
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
Exports pmd_huge for cpt_mm.c
Needed when compiling with CONFIG_HUGETLB_PAGE=y (#61839).
</div>

==== diff-cpt-getdir-fail-20060524 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
[CPT] fail immediately, when get_dir failed

It used to fail only after all the batch is complete. Logs abused.

It is part of larger thing, noticed in bug #62876. The fix is not
ready yet, behaviour is just not so ugly.
</div>

==== diff-cpt-ipv6-20060511 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
[CPT] Support for ipv6 migration
</div>

==== diff-cpt-ipv6-comp-20060524 ====
<div class="change">
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>

Part of diff-ve-net-ipv6-comp-20060524 related to cpt.
Splitted to place booth patches in list into correct place.
</div>

==== diff-cpt-mcfilter-20060602 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
[CPT] checkpoint socket multicast filters

It did not make much of sense with venet device. But this is required with veth. Especially, with IPv6.
</div>

==== diff-cpt-mm-restore-20060524 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
[CPT] restore of mm failed without reasons sometimes

We must not fail, when we cannot restore anon vma clusters.
Old days we had to fail, the problem was solved, but old
safety check was forgotten.
</div>

==== diff-cpt-sigaltstack-20060602 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
[CPT] Checkpoint tasks doing sigaltstack()/SA_ONSTACK correctly

It is funny, the code was present in early versions of checkpointing.
Apparently I removed it while a moment of a mind aberration.
</div>

==== diff-cpt-skfilter-20060527 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
[CPT] support checkpointing of sk filter
</div>

==== diff-cpt-thread-bug-20060527 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
[CPT] bug in dumping nptl threads

All the threads were collected back-to-back, so we expected that they stay in our internal task list in this way. But if one of threads forked some children, the order is broken. Quite silly bug after you know this. It solves bug #63025.
</div>

==== diff-cpt-tty-restore-20060524 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
[CPT] pty migration 2.6.8-&gt;2.6.16 was broken
</div>

==== diff-cpt-ub-leak-20060524 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
[CPT] Fix ub refcnt leak
</div>

==== diff-cpt-up-read-20060522 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
[CPT] do not forget to release mm semaphore in error path
</div>

==== diff-ext3-journalcap-20030317 ====
<div class="change">
Patch from OpenVZ team &lt;devel@openvz.org&gt;:<br/>
Change CAP_SYS_RESOURCE to CAP_SYS_ADMIN in ext3_ioctl

Journal manipulations are forbidden by VE admins with default capabilities (#19625)
</div>

==== diff-ext3-pgfault-20060602 ====
<div class="change">
Patch from Andrey Savochkin &lt;saw@openvz.org&gt;:<br/>
Reorganization of ext3_prepare_write/ext3_commit_write

This eliminates the possibility of the page fault in between,
inside a transaction. It could cause GFP_FS allocation, re-entering into ext3 code possibly with a different superblock and journal, ranking violation of journalling serialization and mmap_sem and page lock and all other kinds of funny consequences. (#22347)

The solution suggested by Chris Mason is to move all the logic
including hole instantiation into commit_write.
</div>
==== diff-fairsched-curr-task-20060511 ====

<div class="change">
Patch from Andrey Mirkin &lt;amirkin@openvz.org&gt;:<br/>
Fix curr_task()/set_curr_task() for fairsched
</div>

==== diff-fairsched-debug-remove-20060511 ====
<div class="change">
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>

Remove debug printk() from vmigration call

</div>
==== diff-fairsched-dep-20060517 ====
<div class="change">
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
Fix of CONFIG_FAIRSCHED/CONFIG_SCHED_VCPU declarations in Kconfig

Consists of two parts:
# Move these options from arch-dependent Kconfigs into kernel/Kconfig.fairsched;
# Change dependency - FAIRSCHED depends on SCHED_VCPU not vice-versa.
</div>

==== diff-fairsched-lockless-ctx-20060511 ====
<div class="change">
Patch from Kirill Korotaev &lt;dev@openvz.org&gt;:<br/>

Remove #error with warning introduced by me in fairsched patch.

IA64 lockless ctx switch should work fine on IA64 with current oncpu conception.

</div>
==== diff-fairsched-ve-20060530 ====
<div class="change">
Patch from OpenVZ team &lt;devel@openvz.org&gt;:<br/>
Virtualization fixes in fairsched.

This includes capability tuning, some per-ve statistics
and /proc/fairsched file with old-format data that may
be needed by some utils (vzcpucheck at least).

{{Bug|176}}.
</div>

==== diff-flock-return-error-20060605 ====
<div class="change">
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
Return error in case flock failed.

If flock_lock_file() failed to allocate flock with locks_alloc_lock()
then "error = 0" is returned. Need to return some non-zero.
</div>

==== diff-fs-quotcompat-20050921 ====
<div class="change">
Patch from Andrey Savochkin &lt;saw@openvz.org&gt;:<br/>
This patch implements compatibility quotactls for old quota tools.
</div>

==== diff-fs-revalidate-20060605 ====
<div class="change">
Patch from Andrey Savochkin &lt;saw@openvz.org&gt;:<br/>
Fixed revalidation for NFS dentries.

The problem was introduced in the mainstream by
http://linux.bkbits.net:8080/linux-2.4/cset@1.181?nav=index.html|src/|src/fs|related/fs/namei.c
(see also the description at http://www.uwsg.iu.edu/hypermail/linux/kernel/0201.2/0316.html)
This patch fixes non-uniform use of d_revalidate method in VFS and makes VFS returns ESTALE only for the weird NFS cases (#18356).
</div>

==== diff-ia64-tif-freeze-20060511 ====
<div class="change">
Patch from Andrey Mirkin &lt;amirkin@openvz.org&gt;:<br/>
[IA64] Add TIF_FREEZE flag to ia64.
</div>

==== diff-ipt-nf-dbg-arp-20030317 ====
<div class="change">
Patch from OpenVZ team &lt;devel@openvz.org&gt;:<br/>
Clean skb-&gt;nf_debug before packet re-process (#19592).
</div>

==== diff-jbd-kthread-20041028 ====
<div class="change">
Patch from Kirill Korotaev &lt;dev@openvz.org&gt;:<br/>
Add the check of the kernel_thread() result for jbd.

This prevents a process hang during mount ext3 inside VE (#35206).
</div>

==== diff-merge-2.6.16.18-20060530 ====
<div class="change">
Patch from OpenVZ team &lt;devel@openvz.org&gt;:<br/>
Merged 2.6.16.18 from /linux/kernel/git/stable/linux-2.6.16.y
</div>

==== diff-ms-allocwarn-20060605 ====
<div class="change">
Patch from Dmitry Mishin &lt;dim@openvz.org&gt;:<br/>

Suppress messages about page allocation fails in kernel
(#43925).
</div>

==== diff-ms-dcache-race-during-umount ====
<div class="change">
Patch from Neil Brown &lt;neilb@suse.de&gt;:<br/>
Replaced OpenVZ version of dcache-race-fix with -mm tree's one.

Original comment from Neil Brown:

The race is that the shrink_dcache_memory shrinker could get called while a
filesystem is being unmounted, and could try to prune a dentry belonging to
that filesystem.

If it does, then it will call in to iput on the inode while the dentry is
no longer able to be found by the umounting process. If iput takes a
while, generic_shutdown_super could get all the way though
shrink_dcache_parent and shrink_dcache_anon and invalidate_inodes without
ever waiting on this particular inode.

Eventually the superblock gets freed anyway and if the iput tried to touch
it (which some filesystems certainly do), it will lose. The promised
'Self-destruct in 5 seconds' doesn't lead to a nice day.

The race is closed by holding s_umount while calling prune_one_dentry on
someone else's dentry. As a down_read_trylock is used,
shrink_dcache_memory will no longer try to prune the dentry of a filesystem
that is being unmounted, and unmount will not be able to start until any
such active prune_one_dentry completes.

This requires that prune_dcache *knows* which filesystem (if any) it is
doing the prune on behalf of so that it can be careful of other
filesystems. shrink_dcache_memory isn't called it on behalf of any
filesystem, and so is careful of everything.

shrink_dcache_anon is now passed a super_block rather than the s_anon list
out of the superblock, so it can get the s_anon list itself, and can pass
the superblock down to prune_dcache.

If prune_dcache finds a dentry that it cannot free, it leaves it where it
is (at the tail of the list) and exits, on the assumption that some other
thread will be removing that dentry soon. To try to make sure that some
work gets done, a limited number of dnetries which are untouchable are
skipped over while choosing the dentry to work on.

I believe this race was first found by Kirill Korotaev.

Cc: Jan Blunck &lt;jblunck@suse.de&gt;<br/>
Cc: Kirill Korotaev &lt;dev@openvz.org&gt;<br/>
Cc: Olaf Hering &lt;olh@suse.de&gt;<br/>
Cc: Balbir Singh &lt;balbir@in.ibm.com&gt;<br/>
<br/>
Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;<br/>
Signed-off-by: Balbir Singh &lt;balbir@in.ibm.com&gt;<br/>
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;<br/>
</div>

==== diff-ms-dst-lock-20060522 ====
<div class="change">
Patch from Dmitry Mishin &lt;dim@openvz.org&gt;:<br/>

Replace add_timer() by mod_timer() in dst_run_gc in order to avoid BUG message.
<source lang="c">
CPU1 CPU2
dst_run_gc() entered dst_run_gc() entered
spin_lock(&dst_lock) .....
del_timer(&dst_gc_timer) fail to get lock
.... mod_timer() <--- puts timer back
.... in list
add_timer(&dst_gc_timer) <--- BUG because timer is in list already.
</source>

Found during OpenVZ internal testing (#62581).
</div>

==== diff-ms-nlrace-20040630 ====
<div class="change">
Patch from Denis Lunev &lt;den@openvz.org&gt;:<br/>

Fixed netlink race, investigated as the reason of synchronous
numothersock, dgramsockbuf and kmempages leak (#34365).
</div>

==== diff-nmiwd-default-20060605 ====
<div class="change">
Patch from OpenVZ team &lt;devel@openvz.org&gt;:<br/>
NMI watchdog turned on by default (#11989).
</div>

==== diff-nmiwd-silence-20060605 ====
<div class="change">
Patch from Vasily Averin &lt;vvs@openvz.org&gt;:<br/>

Allow to set console log level to silence level if NMI Watchdog detected LOCKUP (#12002).
</div>

==== diff-security-ipt-counters-20060516 ====

<div class="change">
Patch from Kirill Korotaev &lt;dev@openvz.org&gt;:<br/>
This patch fixes buffer size check in do_add_counters().

For IPv4 it was fixed in 2.6.16, this one is for IPv6 and arp_tables.
</div>

==== diff-softirqd-20041008 ====
<div class="change">
Patch from Denis Lunev &lt;den@openvz.org&gt;:<br/>
New sysctl enabling/disabling(default) ksoftirqd

Fairsched with vcpu scheduler prohibit physical cpu binding
of task, so softirq threads must be disabled (#3696, #9243).
</div>

==== diff-swapleak-20060602 ====
<div class="change">
Patch from Denis Lunev &lt;den@openvz.org&gt;:<br/>
Adds statistics about the place where swap entries can leak.
</div>

==== diff-tcp-sg-20060605 ====
<div class="change">
Patch from OpenVZ team &lt;devel@openvz.org&gt;:<br/>
Added sysctl net/ipv4/tcp_use_sg to disable scatter/gather IO in tcp

Default value (1) allows scatter/gather IO (#8526)
</div>

==== diff-ubc-net-ipv6-20060511 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
UBC related changes for ipv6
</div>

==== diff-ubc-pbc-hash-opt-20060517 ====
<div class="change">
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
Optimized pb_hash() function.

Former one shifted pfn right. As the result many pages from
one UB happened in one pb_hash chain and slowed the performance
especially on fork.

This patch spreads pages over hash more uniformely and thus
saves up to 25% of fork performance loss compared to vanilla.
</div>

==== diff-ubc-sk-clone-20060529 ====
<div class="change">
Patch from Dmitry Mishin &lt;dim@openvz.org&gt;:<br/>
Fixed oops in inet_sock_destruct due to wrong sk_clone error path.
</div>

==== diff-ubc-uballoc-unify-20060517 ====
<div class="change">
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:

Remove ub_kmalloc/ub_vmalloc/ub_vmalloc_node from ub headers and move them
into place where kmalloc/vmalloc/vmalloc_node are declared.
In CONFIG_USER_RESOURCE case it is ok to pass __GFP_UBC flag into functions.

{{Bug|165}}.
</div>

==== diff-ve-area-access-ipv6-20060511 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
Area access check changes for ipv6
</div>

==== diff-ve-core-ipv6-20060511 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
Changes in vecalls module to support ipv6
</div>

==== diff-ve-headers-ipv6-20060511 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
Changes in ve headers needed for ipv6 virtualization
</div>

==== diff-ve-meminfo-20060515 ====
<div class="change">
Patch from Vasily Tarasov &lt;vtaras@openvz.org&gt;:<br/>
Adds possibility to set totalram parameter (/proc/meminfo)
</div>

==== diff-ve-mount-owner-20060530 ====
<div class="change">
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
This patch adds owner to mounts.

{{Bug|160}}.
</div>

==== diff-ve-net-arp-ndisc-20060515 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
Virtualization of ARP/NDISC

Neighbour tables were already encapsulated and managed as separate
structure, the only thing remained was to allocate them per VE.
Quite cute. No useful effect (except, user can play with arp/ip neigh now), but necessary for future MAC level switching.
</div>

==== diff-ve-netfilter-ipt-redir-20060516 ====
<div class="change">
Patch from Dmitry Mishin &lt;dim@openvz.org&gt;:<br/>
Fixed ipt_REDIRECT work inside VEs.

{{Bug|171}}.
</div>

==== diff-ve-net-ipv6-20060511 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
Core part of ipv6 virtualization
</div>

==== diff-ve-net-ipv6-comp-20060524 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
[IPv6] Add missing declarations
</div>

==== diff-ve-net-ipv6-fix-20060511 ====
<div class="change">
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
Small compilation fix for ipv6 virtualization
</div>

==== diff-ve-net-ipv6-modular-20060517 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
Allow CONFIG_IPV6=m

In this case vzmon becomes dependant on IPv6 module, but it
is not a big deal.
</div>

==== diff-ve-net-ipv6-off-comp-20060530 ====
<div class="change">
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
Compilation fix for CONFIG_IPV6=n case
</div>

==== diff-ve-net-ipv6-proto-headers-20060511 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
Changes in ipv6 headers needed to ipv6 virtualization.
</div>

==== diff-ve-net-snmp-proc-virt-20060531 ====
<div class="change">
Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
Virtualize /proc/net/dev_snmp6 entry

This is needed with ipv6 virtualized and turned on (#63318).
</div>

==== diff-ve-net-venet-ipv6-20060511 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
Changes in venet module to support ipv6.

</div>
==== diff-ve-net-veth-device-20060531 ====
<div class="change">
Patch from Andrey Mirkin &lt;amirkin@openvz.org&gt;:<br/>
This patch introduce virtual ethernet device.

At creation of such device two network devices are created - one
inside VPS and one in VE0. One can specify names and HW address
for both devices.
</div>

==== diff-ve-net-virt-igmp-20060602 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
Virtual /proc/net/igmp.

Oops, it is done for IPv6, but IPv4 was forgotten.
</div>

==== diff-ve-proc-tgid-20060518 ====
<div class="change">
Patch from Dmitry Mishin &lt;dim@openvz.org&gt;:<br/>
Fixed oops in get_tgid_list.

If external (ve0) process lookups proc tree of VE, which is in
ve_cleanup_list, oops in get_tgid_list is possible. Fixed.
</div>

==== diff-ve-vpid-leak-20060602 ====
<div class="change">
Patch from Alexey Kuznetsov &lt;alexey@openvz.org&gt;:<br/>
[PATCH] leakage of vpid_mapping

The problem was that when switching to sparse VPID mappings, we could
have processes with non-virtual pids entered to VE. F.e. it could be
some stuck process from VE setup scripts. In this case we created
useless mapping struct, which was nevere freed, because it referred
to non-virtual pid.

I left a printk() in the code, because we definitely need confirmation that this event really happens. It does not in my tests: to the moment I run 400000 checkpoint/restores and 20000 of migrations on VE and I found no problems, unfortunately. (#62834)
</div>

==== diff-vzdq-allocnofs-20060511 ====
<div class="change">
Patch from Kirill Korotaev &lt;dev@openvz.org&gt;:<br/>
Fixes of other selfdeadlocks in vzquota.

This patch is addon for diff-vzdq-getstat-20060510 and fixes all
other places where allocation with GFP_FS under qmblk-&gt;dq_sem is possible.
</div>

==== diff-vzdq-getstat-20060510 ====
<div class="change">
Patch from Kirill Korotaev &lt;dev@openvz.org&gt;:<br/>

This patch fixes selfdeadlock in vzquota.
quota_ugid_getstat() calls copy_to_user() which can trigger page
fault and stuck on qmbl-&gt;dq_sem. (#62179)
</div>

==== diff-ve-net-veth-context-20060607 ====
<div class="change">
Patch from Andrey Mirkin &lt;amirkin@openvz.org&gt;:<br/>
Veth device fix.

There was a bug in veth_stop(): unregister_netdev() must be
performed in right context. Plus cosmetic cleanups.
</div>

==== diff-tcp-sg-export-20060605 ====
<div class="change">Patch from Pavel Emelianov &lt;xemul@openvz.org&gt;:<br/>
Export sysctl_tcp_use_sg variable.

Without it ipv6 module can't load.
</div>

</noinclude>