From OpenVZ Virtuozzo Containers Wiki
< Download‎ | kernel‎ | 2.6.16‎ | 026test014.4
Revision as of 09:57, 21 March 2008 by Kir (talk | contribs) (created)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search



  • IPv6 virtualization
  • Virtual ethernet device (veth)
  • Checkpointing of skfilter, fixes in threads migration
  • /proc/meminfo tuning from userspace
  • Vzquota lockup fix
  • UBC optimization, leak fixes.
  • Mount operations restriction.
  • Compilation fixes.
  • Mainstream security fixes (up to
  • Some fixes ported from the stable OpenVZ kernel.

Config changes

Same as 026test012.1 plus:

  • +CONFIG_IPV6=y

For the complete list of changes in this release, see git changelog for kernel 026test014.4.



Patch from OpenVZ team <>:
CBQ fairness fixes

  • reapair cbq fairness in its first hank
  • restrict cl->quantum in the second one


Patch from Alexey Kuznetsov <>:
[CPT][X86_64] cpt_mm lost high bits of page offset

"int" was used to store page offset. It is not enough.


Patch from Pavel Emelianov <>:
Exports pmd_huge for cpt_mm.c Needed when compiling with CONFIG_HUGETLB_PAGE=y (#61839).


Patch from Alexey Kuznetsov <>:
[CPT] fail immediately, when get_dir failed

It used to fail only after all the batch is complete. Logs abused.

It is part of larger thing, noticed in bug #62876. The fix is not ready yet, behaviour is just not so ugly.


Patch from Alexey Kuznetsov <>:
[CPT] Support for ipv6 migration


Patch from Pavel Emelianov <>:

Part of diff-ve-net-ipv6-comp-20060524 related to cpt. Splitted to place booth patches in list into correct place.


Patch from Alexey Kuznetsov <>:
[CPT] checkpoint socket multicast filters

It did not make much of sense with venet device. But this is required with veth. Especially, with IPv6.


Patch from Alexey Kuznetsov <>:
[CPT] restore of mm failed without reasons sometimes

We must not fail, when we cannot restore anon vma clusters. Old days we had to fail, the problem was solved, but old safety check was forgotten.


Patch from Alexey Kuznetsov <>:
[CPT] Checkpoint tasks doing sigaltstack()/SA_ONSTACK correctly

It is funny, the code was present in early versions of checkpointing. Apparently I removed it while a moment of a mind aberration.


Patch from Alexey Kuznetsov <>:
[CPT] support checkpointing of sk filter


Patch from Alexey Kuznetsov <>:
[CPT] bug in dumping nptl threads

All the threads were collected back-to-back, so we expected that they stay in our internal task list in this way. But if one of threads forked some children, the order is broken. Quite silly bug after you know this. It solves bug #63025.


Patch from Alexey Kuznetsov <>:
[CPT] pty migration 2.6.8->2.6.16 was broken


Patch from Alexey Kuznetsov <>:
[CPT] Fix ub refcnt leak


Patch from Alexey Kuznetsov <>:
[CPT] do not forget to release mm semaphore in error path


Patch from OpenVZ team <>:
Change CAP_SYS_RESOURCE to CAP_SYS_ADMIN in ext3_ioctl

Journal manipulations are forbidden by VE admins with default capabilities (#19625)


Patch from Andrey Savochkin <>:
Reorganization of ext3_prepare_write/ext3_commit_write

This eliminates the possibility of the page fault in between, inside a transaction. It could cause GFP_FS allocation, re-entering into ext3 code possibly with a different superblock and journal, ranking violation of journalling serialization and mmap_sem and page lock and all other kinds of funny consequences. (#22347)

The solution suggested by Chris Mason is to move all the logic including hole instantiation into commit_write.


Patch from Andrey Mirkin <>:
Fix curr_task()/set_curr_task() for fairsched


Patch from Pavel Emelianov <>:

Remove debug printk() from vmigration call


Patch from Pavel Emelianov <>:
Fix of CONFIG_FAIRSCHED/CONFIG_SCHED_VCPU declarations in Kconfig

Consists of two parts:

  1. Move these options from arch-dependent Kconfigs into kernel/Kconfig.fairsched;
  2. Change dependency - FAIRSCHED depends on SCHED_VCPU not vice-versa.


Patch from Kirill Korotaev <>:

Remove #error with warning introduced by me in fairsched patch.

IA64 lockless ctx switch should work fine on IA64 with current oncpu conception.


Patch from OpenVZ team <>:
Virtualization fixes in fairsched.

This includes capability tuning, some per-ve statistics and /proc/fairsched file with old-format data that may be needed by some utils (vzcpucheck at least).

OpenVZ Bug #176.


Patch from Pavel Emelianov <>:
Return error in case flock failed.

If flock_lock_file() failed to allocate flock with locks_alloc_lock() then "error = 0" is returned. Need to return some non-zero.


Patch from Andrey Savochkin <>:
This patch implements compatibility quotactls for old quota tools.


Patch from Andrey Savochkin <>:
Fixed revalidation for NFS dentries.

The problem was introduced in the mainstream by (see also the description at This patch fixes non-uniform use of d_revalidate method in VFS and makes VFS returns ESTALE only for the weird NFS cases (#18356).


Patch from Andrey Mirkin <>:
[IA64] Add TIF_FREEZE flag to ia64.


Patch from OpenVZ team <>:
Clean skb->nf_debug before packet re-process (#19592).


Patch from Kirill Korotaev <>:
Add the check of the kernel_thread() result for jbd.

This prevents a process hang during mount ext3 inside VE (#35206).


Patch from OpenVZ team <>:
Merged from /linux/kernel/git/stable/linux-2.6.16.y


Patch from Dmitry Mishin <>:

Suppress messages about page allocation fails in kernel (#43925).


Patch from Neil Brown <>:
Replaced OpenVZ version of dcache-race-fix with -mm tree's one.

Original comment from Neil Brown:

The race is that the shrink_dcache_memory shrinker could get called while a filesystem is being unmounted, and could try to prune a dentry belonging to that filesystem.

If it does, then it will call in to iput on the inode while the dentry is no longer able to be found by the umounting process. If iput takes a while, generic_shutdown_super could get all the way though shrink_dcache_parent and shrink_dcache_anon and invalidate_inodes without ever waiting on this particular inode.

Eventually the superblock gets freed anyway and if the iput tried to touch it (which some filesystems certainly do), it will lose. The promised 'Self-destruct in 5 seconds' doesn't lead to a nice day.

The race is closed by holding s_umount while calling prune_one_dentry on someone else's dentry. As a down_read_trylock is used, shrink_dcache_memory will no longer try to prune the dentry of a filesystem that is being unmounted, and unmount will not be able to start until any such active prune_one_dentry completes.

This requires that prune_dcache *knows* which filesystem (if any) it is doing the prune on behalf of so that it can be careful of other filesystems. shrink_dcache_memory isn't called it on behalf of any filesystem, and so is careful of everything.

shrink_dcache_anon is now passed a super_block rather than the s_anon list out of the superblock, so it can get the s_anon list itself, and can pass the superblock down to prune_dcache.

If prune_dcache finds a dentry that it cannot free, it leaves it where it is (at the tail of the list) and exits, on the assumption that some other thread will be removing that dentry soon. To try to make sure that some work gets done, a limited number of dnetries which are untouchable are skipped over while choosing the dentry to work on.

I believe this race was first found by Kirill Korotaev.

Cc: Jan Blunck <>
Cc: Kirill Korotaev <>
Cc: Olaf Hering <>
Cc: Balbir Singh <>

Signed-off-by: Neil Brown <>
Signed-off-by: Balbir Singh <>
Signed-off-by: Andrew Morton <>


Patch from Dmitry Mishin <>:

Replace add_timer() by mod_timer() in dst_run_gc in order to avoid BUG message.

    CPU1                                    CPU2
    dst_run_gc()  entered           dst_run_gc() entered
    spin_lock(&dst_lock)             .....
       del_timer(&dst_gc_timer)        fail to get lock
    ....                                        mod_timer() <--- puts timer back
    ....                                                         in list
       add_timer(&dst_gc_timer) <--- BUG because timer is in list already.

Found during OpenVZ internal testing (#62581).


Patch from Denis Lunev <>:

Fixed netlink race, investigated as the reason of synchronous numothersock, dgramsockbuf and kmempages leak (#34365).


Patch from OpenVZ team <>:
NMI watchdog turned on by default (#11989).


Patch from Vasily Averin <>:

Allow to set console log level to silence level if NMI Watchdog detected LOCKUP (#12002).


Patch from Kirill Korotaev <>:
This patch fixes buffer size check in do_add_counters().

For IPv4 it was fixed in 2.6.16, this one is for IPv6 and arp_tables.


Patch from Denis Lunev <>:
New sysctl enabling/disabling(default) ksoftirqd

Fairsched with vcpu scheduler prohibit physical cpu binding of task, so softirq threads must be disabled (#3696, #9243).


Patch from Denis Lunev <>:
Adds statistics about the place where swap entries can leak.


Patch from OpenVZ team <>:
Added sysctl net/ipv4/tcp_use_sg to disable scatter/gather IO in tcp

Default value (1) allows scatter/gather IO (#8526)


Patch from Alexey Kuznetsov <>:
UBC related changes for ipv6


Patch from Pavel Emelianov <>:
Optimized pb_hash() function.

Former one shifted pfn right. As the result many pages from one UB happened in one pb_hash chain and slowed the performance especially on fork.

This patch spreads pages over hash more uniformely and thus saves up to 25% of fork performance loss compared to vanilla.


Patch from Dmitry Mishin <>:
Fixed oops in inet_sock_destruct due to wrong sk_clone error path.


Patch from Pavel Emelianov <>:

Remove ub_kmalloc/ub_vmalloc/ub_vmalloc_node from ub headers and move them into place where kmalloc/vmalloc/vmalloc_node are declared. In CONFIG_USER_RESOURCE case it is ok to pass __GFP_UBC flag into functions.

OpenVZ Bug #165.


Patch from Alexey Kuznetsov <>:
Area access check changes for ipv6


Patch from Alexey Kuznetsov <>:
Changes in vecalls module to support ipv6


Patch from Alexey Kuznetsov <>:
Changes in ve headers needed for ipv6 virtualization


Patch from Vasily Tarasov <>:
Adds possibility to set totalram parameter (/proc/meminfo)


Patch from Pavel Emelianov <>:
This patch adds owner to mounts.

OpenVZ Bug #160.


Patch from Alexey Kuznetsov <>:
Virtualization of ARP/NDISC

Neighbour tables were already encapsulated and managed as separate structure, the only thing remained was to allocate them per VE. Quite cute. No useful effect (except, user can play with arp/ip neigh now), but necessary for future MAC level switching.


Patch from Dmitry Mishin <>:
Fixed ipt_REDIRECT work inside VEs.

OpenVZ Bug #171.


Patch from Alexey Kuznetsov <>:
Core part of ipv6 virtualization


Patch from Alexey Kuznetsov <>:
[IPv6] Add missing declarations


Patch from Pavel Emelianov <>:
Small compilation fix for ipv6 virtualization


Patch from Alexey Kuznetsov <>:

In this case vzmon becomes dependant on IPv6 module, but it is not a big deal.


Patch from Pavel Emelianov <>:
Compilation fix for CONFIG_IPV6=n case


Patch from Alexey Kuznetsov <>:
Changes in ipv6 headers needed to ipv6 virtualization.


Patch from Pavel Emelianov <>:
Virtualize /proc/net/dev_snmp6 entry

This is needed with ipv6 virtualized and turned on (#63318).


Patch from Alexey Kuznetsov <>:
Changes in venet module to support ipv6.


Patch from Andrey Mirkin <>:
This patch introduce virtual ethernet device.

At creation of such device two network devices are created - one inside VPS and one in VE0. One can specify names and HW address for both devices.


Patch from Alexey Kuznetsov <>:
Virtual /proc/net/igmp.

Oops, it is done for IPv6, but IPv4 was forgotten.


Patch from Dmitry Mishin <>:
Fixed oops in get_tgid_list.

If external (ve0) process lookups proc tree of VE, which is in ve_cleanup_list, oops in get_tgid_list is possible. Fixed.


Patch from Alexey Kuznetsov <>:
[PATCH] leakage of vpid_mapping

The problem was that when switching to sparse VPID mappings, we could have processes with non-virtual pids entered to VE. F.e. it could be some stuck process from VE setup scripts. In this case we created useless mapping struct, which was nevere freed, because it referred to non-virtual pid.

I left a printk() in the code, because we definitely need confirmation that this event really happens. It does not in my tests: to the moment I run 400000 checkpoint/restores and 20000 of migrations on VE and I found no problems, unfortunately. (#62834)


Patch from Kirill Korotaev <>:
Fixes of other selfdeadlocks in vzquota.

This patch is addon for diff-vzdq-getstat-20060510 and fixes all other places where allocation with GFP_FS under qmblk->dq_sem is possible.


Patch from Kirill Korotaev <>:

This patch fixes selfdeadlock in vzquota. quota_ugid_getstat() calls copy_to_user() which can trigger page fault and stuck on qmbl->dq_sem. (#62179)


Patch from Andrey Mirkin <>:
Veth device fix.

There was a bug in veth_stop(): unregister_netdev() must be performed in right context. Plus cosmetic cleanups.


Patch from Pavel Emelianov <>:

Export sysctl_tcp_use_sg variable.

Without it ipv6 module can't load.