Open main menu

OpenVZ Virtuozzo Containers Wiki β

Download/kernel/2.6.16/026test014.4/changes

< Download‎ | kernel‎ | 2.6.16‎ | 026test014.4

Contents

Changes

  • IPv6 virtualization
  • Virtual ethernet device (veth)
  • Checkpointing of skfilter, fixes in threads migration
  • /proc/meminfo tuning from userspace
  • Vzquota lockup fix
  • UBC optimization, leak fixes.
  • Mount operations restriction.
  • Compilation fixes.
  • Mainstream security fixes (up to 2.6.16.18).
  • Some fixes ported from the stable OpenVZ kernel.

Config changes

Same as 026test012.1 plus:

  • +CONFIG_IPV6=y
  • +CONFIG_NMI_WATCHDOG=y
  • +CONFIG_QUOTA_COMPAT=y
  • +CONFIG_VE_ETHDEV=m
  • +CONFIG_BRIDGE=m

For the complete list of changes in this release, see git changelog for kernel 026test014.4.


Patches

diff-cbq-fairness-20020927

Patch from OpenVZ team <devel@openvz.org>:
CBQ fairness fixes

  • reapair cbq fairness in its first hank
  • restrict cl->quantum in the second one

diff-cpt-emt64-pgoff-20060512

Patch from Alexey Kuznetsov <alexey@openvz.org>:
[CPT][X86_64] cpt_mm lost high bits of page offset

"int" was used to store page offset. It is not enough.

diff-cpt-export-pmd-huge-20060511

Patch from Pavel Emelianov <xemul@openvz.org>:
Exports pmd_huge for cpt_mm.c Needed when compiling with CONFIG_HUGETLB_PAGE=y (#61839).

diff-cpt-getdir-fail-20060524

Patch from Alexey Kuznetsov <alexey@openvz.org>:
[CPT] fail immediately, when get_dir failed

It used to fail only after all the batch is complete. Logs abused.

It is part of larger thing, noticed in bug #62876. The fix is not ready yet, behaviour is just not so ugly.

diff-cpt-ipv6-20060511

Patch from Alexey Kuznetsov <alexey@openvz.org>:
[CPT] Support for ipv6 migration

diff-cpt-ipv6-comp-20060524

Patch from Pavel Emelianov <xemul@openvz.org>:

Part of diff-ve-net-ipv6-comp-20060524 related to cpt. Splitted to place booth patches in list into correct place.

diff-cpt-mcfilter-20060602

Patch from Alexey Kuznetsov <alexey@openvz.org>:
[CPT] checkpoint socket multicast filters

It did not make much of sense with venet device. But this is required with veth. Especially, with IPv6.

diff-cpt-mm-restore-20060524

Patch from Alexey Kuznetsov <alexey@openvz.org>:
[CPT] restore of mm failed without reasons sometimes

We must not fail, when we cannot restore anon vma clusters. Old days we had to fail, the problem was solved, but old safety check was forgotten.

diff-cpt-sigaltstack-20060602

Patch from Alexey Kuznetsov <alexey@openvz.org>:
[CPT] Checkpoint tasks doing sigaltstack()/SA_ONSTACK correctly

It is funny, the code was present in early versions of checkpointing. Apparently I removed it while a moment of a mind aberration.

diff-cpt-skfilter-20060527

Patch from Alexey Kuznetsov <alexey@openvz.org>:
[CPT] support checkpointing of sk filter

diff-cpt-thread-bug-20060527

Patch from Alexey Kuznetsov <alexey@openvz.org>:
[CPT] bug in dumping nptl threads

All the threads were collected back-to-back, so we expected that they stay in our internal task list in this way. But if one of threads forked some children, the order is broken. Quite silly bug after you know this. It solves bug #63025.

diff-cpt-tty-restore-20060524

Patch from Alexey Kuznetsov <alexey@openvz.org>:
[CPT] pty migration 2.6.8->2.6.16 was broken

diff-cpt-ub-leak-20060524

Patch from Alexey Kuznetsov <alexey@openvz.org>:
[CPT] Fix ub refcnt leak

diff-cpt-up-read-20060522

Patch from Alexey Kuznetsov <alexey@openvz.org>:
[CPT] do not forget to release mm semaphore in error path

diff-ext3-journalcap-20030317

Patch from OpenVZ team <devel@openvz.org>:
Change CAP_SYS_RESOURCE to CAP_SYS_ADMIN in ext3_ioctl

Journal manipulations are forbidden by VE admins with default capabilities (#19625)

diff-ext3-pgfault-20060602

Patch from Andrey Savochkin <saw@openvz.org>:
Reorganization of ext3_prepare_write/ext3_commit_write

This eliminates the possibility of the page fault in between, inside a transaction. It could cause GFP_FS allocation, re-entering into ext3 code possibly with a different superblock and journal, ranking violation of journalling serialization and mmap_sem and page lock and all other kinds of funny consequences. (#22347)

The solution suggested by Chris Mason is to move all the logic including hole instantiation into commit_write.

diff-fairsched-curr-task-20060511

Patch from Andrey Mirkin <amirkin@openvz.org>:
Fix curr_task()/set_curr_task() for fairsched

diff-fairsched-debug-remove-20060511

Patch from Pavel Emelianov <xemul@openvz.org>:

Remove debug printk() from vmigration call

diff-fairsched-dep-20060517

Patch from Pavel Emelianov <xemul@openvz.org>:
Fix of CONFIG_FAIRSCHED/CONFIG_SCHED_VCPU declarations in Kconfig

Consists of two parts:

  1. Move these options from arch-dependent Kconfigs into kernel/Kconfig.fairsched;
  2. Change dependency - FAIRSCHED depends on SCHED_VCPU not vice-versa.

diff-fairsched-lockless-ctx-20060511

Patch from Kirill Korotaev <dev@openvz.org>:

Remove #error with warning introduced by me in fairsched patch.

IA64 lockless ctx switch should work fine on IA64 with current oncpu conception.

diff-fairsched-ve-20060530

Patch from OpenVZ team <devel@openvz.org>:
Virtualization fixes in fairsched.

This includes capability tuning, some per-ve statistics and /proc/fairsched file with old-format data that may be needed by some utils (vzcpucheck at least).

OpenVZ Bug #176.

diff-flock-return-error-20060605

Patch from Pavel Emelianov <xemul@openvz.org>:
Return error in case flock failed.

If flock_lock_file() failed to allocate flock with locks_alloc_lock() then "error = 0" is returned. Need to return some non-zero.

diff-fs-quotcompat-20050921

Patch from Andrey Savochkin <saw@openvz.org>:
This patch implements compatibility quotactls for old quota tools.

diff-fs-revalidate-20060605

Patch from Andrey Savochkin <saw@openvz.org>:
Fixed revalidation for NFS dentries.

The problem was introduced in the mainstream by http://linux.bkbits.net:8080/linux-2.4/cset@1.181?nav=index.html%7Csrc/%7Csrc/fs%7Crelated/fs/namei.c (see also the description at http://www.uwsg.iu.edu/hypermail/linux/kernel/0201.2/0316.html) This patch fixes non-uniform use of d_revalidate method in VFS and makes VFS returns ESTALE only for the weird NFS cases (#18356).

diff-ia64-tif-freeze-20060511

Patch from Andrey Mirkin <amirkin@openvz.org>:
[IA64] Add TIF_FREEZE flag to ia64.

diff-ipt-nf-dbg-arp-20030317

Patch from OpenVZ team <devel@openvz.org>:
Clean skb->nf_debug before packet re-process (#19592).

diff-jbd-kthread-20041028

Patch from Kirill Korotaev <dev@openvz.org>:
Add the check of the kernel_thread() result for jbd.

This prevents a process hang during mount ext3 inside VE (#35206).

diff-merge-2.6.16.18-20060530

Patch from OpenVZ team <devel@openvz.org>:
Merged 2.6.16.18 from /linux/kernel/git/stable/linux-2.6.16.y

diff-ms-allocwarn-20060605

Patch from Dmitry Mishin <dim@openvz.org>:

Suppress messages about page allocation fails in kernel (#43925).

diff-ms-dcache-race-during-umount

Patch from Neil Brown <neilb@suse.de>:
Replaced OpenVZ version of dcache-race-fix with -mm tree's one.

Original comment from Neil Brown:

The race is that the shrink_dcache_memory shrinker could get called while a filesystem is being unmounted, and could try to prune a dentry belonging to that filesystem.

If it does, then it will call in to iput on the inode while the dentry is no longer able to be found by the umounting process. If iput takes a while, generic_shutdown_super could get all the way though shrink_dcache_parent and shrink_dcache_anon and invalidate_inodes without ever waiting on this particular inode.

Eventually the superblock gets freed anyway and if the iput tried to touch it (which some filesystems certainly do), it will lose. The promised 'Self-destruct in 5 seconds' doesn't lead to a nice day.

The race is closed by holding s_umount while calling prune_one_dentry on someone else's dentry. As a down_read_trylock is used, shrink_dcache_memory will no longer try to prune the dentry of a filesystem that is being unmounted, and unmount will not be able to start until any such active prune_one_dentry completes.

This requires that prune_dcache *knows* which filesystem (if any) it is doing the prune on behalf of so that it can be careful of other filesystems. shrink_dcache_memory isn't called it on behalf of any filesystem, and so is careful of everything.

shrink_dcache_anon is now passed a super_block rather than the s_anon list out of the superblock, so it can get the s_anon list itself, and can pass the superblock down to prune_dcache.

If prune_dcache finds a dentry that it cannot free, it leaves it where it is (at the tail of the list) and exits, on the assumption that some other thread will be removing that dentry soon. To try to make sure that some work gets done, a limited number of dnetries which are untouchable are skipped over while choosing the dentry to work on.

I believe this race was first found by Kirill Korotaev.

Cc: Jan Blunck <jblunck@suse.de>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Olaf Hering <olh@suse.de>
Cc: Balbir Singh <balbir@in.ibm.com>

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>

diff-ms-dst-lock-20060522

Patch from Dmitry Mishin <dim@openvz.org>:

Replace add_timer() by mod_timer() in dst_run_gc in order to avoid BUG message.

    CPU1                                    CPU2
    dst_run_gc()  entered           dst_run_gc() entered
    spin_lock(&dst_lock)             .....
       del_timer(&dst_gc_timer)        fail to get lock
    ....                                        mod_timer() <--- puts timer back
    ....                                                         in list
       add_timer(&dst_gc_timer) <--- BUG because timer is in list already.

Found during OpenVZ internal testing (#62581).

diff-ms-nlrace-20040630

Patch from Denis Lunev <den@openvz.org>:

Fixed netlink race, investigated as the reason of synchronous numothersock, dgramsockbuf and kmempages leak (#34365).

diff-nmiwd-default-20060605

Patch from OpenVZ team <devel@openvz.org>:
NMI watchdog turned on by default (#11989).

diff-nmiwd-silence-20060605

Patch from Vasily Averin <vvs@openvz.org>:

Allow to set console log level to silence level if NMI Watchdog detected LOCKUP (#12002).

diff-security-ipt-counters-20060516

Patch from Kirill Korotaev <dev@openvz.org>:
This patch fixes buffer size check in do_add_counters().

For IPv4 it was fixed in 2.6.16, this one is for IPv6 and arp_tables.

diff-softirqd-20041008

Patch from Denis Lunev <den@openvz.org>:
New sysctl enabling/disabling(default) ksoftirqd

Fairsched with vcpu scheduler prohibit physical cpu binding of task, so softirq threads must be disabled (#3696, #9243).

diff-swapleak-20060602

Patch from Denis Lunev <den@openvz.org>:
Adds statistics about the place where swap entries can leak.

diff-tcp-sg-20060605

Patch from OpenVZ team <devel@openvz.org>:
Added sysctl net/ipv4/tcp_use_sg to disable scatter/gather IO in tcp

Default value (1) allows scatter/gather IO (#8526)

diff-ubc-net-ipv6-20060511

Patch from Alexey Kuznetsov <alexey@openvz.org>:
UBC related changes for ipv6

diff-ubc-pbc-hash-opt-20060517

Patch from Pavel Emelianov <xemul@openvz.org>:
Optimized pb_hash() function.

Former one shifted pfn right. As the result many pages from one UB happened in one pb_hash chain and slowed the performance especially on fork.

This patch spreads pages over hash more uniformely and thus saves up to 25% of fork performance loss compared to vanilla.

diff-ubc-sk-clone-20060529

Patch from Dmitry Mishin <dim@openvz.org>:
Fixed oops in inet_sock_destruct due to wrong sk_clone error path.

diff-ubc-uballoc-unify-20060517

Patch from Pavel Emelianov <xemul@openvz.org>:

Remove ub_kmalloc/ub_vmalloc/ub_vmalloc_node from ub headers and move them into place where kmalloc/vmalloc/vmalloc_node are declared. In CONFIG_USER_RESOURCE case it is ok to pass __GFP_UBC flag into functions.

OpenVZ Bug #165.

diff-ve-area-access-ipv6-20060511

Patch from Alexey Kuznetsov <alexey@openvz.org>:
Area access check changes for ipv6

diff-ve-core-ipv6-20060511

Patch from Alexey Kuznetsov <alexey@openvz.org>:
Changes in vecalls module to support ipv6

diff-ve-headers-ipv6-20060511

Patch from Alexey Kuznetsov <alexey@openvz.org>:
Changes in ve headers needed for ipv6 virtualization

diff-ve-meminfo-20060515

Patch from Vasily Tarasov <vtaras@openvz.org>:
Adds possibility to set totalram parameter (/proc/meminfo)

diff-ve-mount-owner-20060530

Patch from Pavel Emelianov <xemul@openvz.org>:
This patch adds owner to mounts.

OpenVZ Bug #160.

diff-ve-net-arp-ndisc-20060515

Patch from Alexey Kuznetsov <alexey@openvz.org>:
Virtualization of ARP/NDISC

Neighbour tables were already encapsulated and managed as separate structure, the only thing remained was to allocate them per VE. Quite cute. No useful effect (except, user can play with arp/ip neigh now), but necessary for future MAC level switching.

diff-ve-netfilter-ipt-redir-20060516

Patch from Dmitry Mishin <dim@openvz.org>:
Fixed ipt_REDIRECT work inside VEs.

OpenVZ Bug #171.

diff-ve-net-ipv6-20060511

Patch from Alexey Kuznetsov <alexey@openvz.org>:
Core part of ipv6 virtualization

diff-ve-net-ipv6-comp-20060524

Patch from Alexey Kuznetsov <alexey@openvz.org>:
[IPv6] Add missing declarations

diff-ve-net-ipv6-fix-20060511

Patch from Pavel Emelianov <xemul@openvz.org>:
Small compilation fix for ipv6 virtualization

diff-ve-net-ipv6-modular-20060517

Patch from Alexey Kuznetsov <alexey@openvz.org>:
Allow CONFIG_IPV6=m

In this case vzmon becomes dependant on IPv6 module, but it is not a big deal.

diff-ve-net-ipv6-off-comp-20060530

Patch from Pavel Emelianov <xemul@openvz.org>:
Compilation fix for CONFIG_IPV6=n case

diff-ve-net-ipv6-proto-headers-20060511

Patch from Alexey Kuznetsov <alexey@openvz.org>:
Changes in ipv6 headers needed to ipv6 virtualization.

diff-ve-net-snmp-proc-virt-20060531

Patch from Pavel Emelianov <xemul@openvz.org>:
Virtualize /proc/net/dev_snmp6 entry

This is needed with ipv6 virtualized and turned on (#63318).

diff-ve-net-venet-ipv6-20060511

Patch from Alexey Kuznetsov <alexey@openvz.org>:
Changes in venet module to support ipv6.

diff-ve-net-veth-device-20060531

Patch from Andrey Mirkin <amirkin@openvz.org>:
This patch introduce virtual ethernet device.

At creation of such device two network devices are created - one inside VPS and one in VE0. One can specify names and HW address for both devices.

diff-ve-net-virt-igmp-20060602

Patch from Alexey Kuznetsov <alexey@openvz.org>:
Virtual /proc/net/igmp.

Oops, it is done for IPv6, but IPv4 was forgotten.

diff-ve-proc-tgid-20060518

Patch from Dmitry Mishin <dim@openvz.org>:
Fixed oops in get_tgid_list.

If external (ve0) process lookups proc tree of VE, which is in ve_cleanup_list, oops in get_tgid_list is possible. Fixed.

diff-ve-vpid-leak-20060602

Patch from Alexey Kuznetsov <alexey@openvz.org>:
[PATCH] leakage of vpid_mapping

The problem was that when switching to sparse VPID mappings, we could have processes with non-virtual pids entered to VE. F.e. it could be some stuck process from VE setup scripts. In this case we created useless mapping struct, which was nevere freed, because it referred to non-virtual pid.

I left a printk() in the code, because we definitely need confirmation that this event really happens. It does not in my tests: to the moment I run 400000 checkpoint/restores and 20000 of migrations on VE and I found no problems, unfortunately. (#62834)

diff-vzdq-allocnofs-20060511

Patch from Kirill Korotaev <dev@openvz.org>:
Fixes of other selfdeadlocks in vzquota.

This patch is addon for diff-vzdq-getstat-20060510 and fixes all other places where allocation with GFP_FS under qmblk->dq_sem is possible.

diff-vzdq-getstat-20060510

Patch from Kirill Korotaev <dev@openvz.org>:

This patch fixes selfdeadlock in vzquota. quota_ugid_getstat() calls copy_to_user() which can trigger page fault and stuck on qmbl->dq_sem. (#62179)

diff-ve-net-veth-context-20060607

Patch from Andrey Mirkin <amirkin@openvz.org>:
Veth device fix.

There was a bug in veth_stop(): unregister_netdev() must be performed in right context. Plus cosmetic cleanups.

diff-tcp-sg-export-20060605

Patch from Pavel Emelianov <xemul@openvz.org>:

Export sysctl_tcp_use_sg variable.

Without it ipv6 module can't load.