6,534
edits
Changes
created
== Changes ==
* UBC optimisations
* CPT fixes and updates from stable branch.
* Dynamic VCPU control
* VE mass stop speedup
* Loopback statistics
* Compilations fixes
* sysfs ptmx virtualization.
* Mainstream update up to 2.6.16.29.
* Code cleanups from sparse.
<includeonly>[[{{PAGENAME}}/changes#Patches|{{Long changelog message}}]]</includeonly><noinclude>
=== Patches ===
==== diff-cpt-annoying-printk ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:<br/>
[CPT] remove annoying printk
In 2.6.9 printk("=") in refrigerator() is commented out.
We should remove printk(">\n") in cpt. The code with comment
is not removed, but commented out to remember that we have to
return this, if the printk in refrigerator() is uncommented.
</div>
==== diff-cpt-asmlinkage ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:<br/>
[CPT] asmlinkage attribute was forgotten
This fixes CPT with CONFIG_REGPARAM compiled
</div>
==== diff-cpt-caps-fix-20060908 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[CPT] capabilities check fixes
* namespace->sem is replaced with namespace_sem;
* task->used_math is replaced with tsk_used_math().
</div>
==== diff-cpt-checks-20060908 ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:<br/>
This patch adds checking for unsupported CPT features.
</div>
==== diff-cpt-clone-zombie ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:<br/>
[CPT] restoring threads with tsk->fs==NULL
If a nptl thread is ptraced, it does not die immediately
and we can arrive to the state:
<pre>
parent
|
main_thread -----> thread1 [ptraced]
in TASK_ZOMBIE in TASK_ZOMBIE
</pre>
To restore such configuration we do kernel_thread(CLONE_SIGNAL)
in context of main_thread. But if it is exited, it has tsk->fs == NULL
and kernel oopes.
Suggested fix is very simple: we just attach temporary fs_struct
from init task of VE. Also, we have to delay initialization of
tsk->group_exit,
otherwise kernel will not allow us to clone.
This fix is pragmatic.
Better fix would be restructuring of restore to delay zombification
until the last stage of restore. I.e. we could restore all the tree
of alive processes with all the attributes of alive task (fs, mm etc).
And after it is complete, we could make one more pass and collect garbage
killing zombie tasks and clearing fs, mm etc. It would be cleaner
and safer, but requires too much of changes.
Bug #65219.
</div>
==== diff-cpt-core-misc-2 ====
<div class="change">
Removed hunks after optimisation patches
</div>
==== diff-cpt-func-declaration-fix-20060916 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[CPT] Don't leave function argument list empty - use 'void'
</div>
==== diff-cpt-ifindex-renumber-2.6.16-20060908 ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:
This patch adds renumbering of netdev->ifindex'es on restore process.
We can do this because network is suspended. All manipulations are
protected with rtnl_lock().
</div>
==== diff-cpt-iptables-path ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:<br/>
This patch fixes iptables save/restore on SUSE.
Bug #62837.
</div>
==== diff-cpt-kernel-dpath-20060908 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Export and unstatic __d_path() call for CPT capability checks.
</div>
==== diff-cpt-mm-eagain-20060817 ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:<br/>
In tests we can see message: "mm_struct is referenced outside"
After that message checkpoint fails.
It seems that this situation is legal, so checkpoint could be restarted.
So we return -EAGAIN to be able to restart checkpoint.
</div>
==== diff-cpt-net-ifindex-cleanup-20060908 ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:<br/>
This patch removes renumbering of ifindexes of venet and loopback
devices on restore.
</div>
==== diff-cpt-net-lock-20060727 ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:<br/>
Network devices list were not protected while checkpoining.
This patch adds necessary protection.
</div>
==== diff-cpt-ptraceme-20060908 ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:<br/>
[CPT] SMP race in detecting state of ptraced processes
When suspending VE, we test state of processes while they are
still running. It is not a bug: we have to verify for invalid state
before checkpointing, real state is saved after processes are scheduled
out.
The impact is that we can see process in a bad state, f.e. stopped
without any reasons. It is also not a bug, but this rersults in random
failures of checkpointing. The only way to fix this is to order updates
of state variables. The order is correct almost everywhere.
</div>
==== diff-cpt-restore-mnt-flags-20060831 ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:
Mount point's mnt_flags (noexec,nosuid,nodev) were omitted and
not restored correctly.
This patch should be applied with patch for bind mounts
in other case we should do the following:
<ol>
<li> Remove check for bind-mounts in do_remount() function
<li>Change procedure for restoring bind-mounts in next way:
<pre>
do_mount(bind);
do_remount(mnt_flags).
</pre>
</ol>
</div>
==== diff-cpt-rst-dir ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:<br/>
[CPT] do not keep open cwd while restore
>>From the viewpoint of CPT, cwd/root are very similar to an open
file, it is just pair dentry/mnt. Normally, when opening some file
we store it and its inode in special object cache to resolve opening
of the same inode, when some of its aliases (dentries) are deleted.
But it is useless for directories, which cannot be hardlinked ever.
And this consumes numfile UBC, so that restore can fail easily.
So, do not store cwd/root file, unless it is deleted. This does not
solve problem with restoring VE hitting numfiles, but relieves it a lot.
Now we can temporarily increase numfile limit while cpt/rst by 2 and
everything should be OK.
</div>
==== diff-cpt-rst-sigdfl-20060830 ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:
[CPT] save/restore even SIG_DFL handlers
Linux has a funny feature: when SA_ONESHOT signal resets
handler, flags are not set to default. And LTP tests verify
this pathology.
</div>
==== diff-cpt-suid-dumpable ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:<br/>
[CPT] restore mm->dumpable correctly
mm->dumpable is not boolean in >=2.6.9, but tri-state.
Just save and restore raw value.
</div>
==== diff-cpt-suspend-cleanup ====
<div class="change">
Patch from Kirill Korotaev <dev@openvz.org>:<br/>
Fix of compilation of diff-cpt-suspend-cleanup.
</div>
==== diff-cpt-susp-printk-20060908 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[CPT] Remove printk("|\n") from refrigerator.
(#55914)
</div>
==== diff-cpt-tcp-bind-bug-20060831 ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:<br/>
[CPT] tcp sockets were bind()ed incorrectly during restore
This case was totally missed. Fortunately, this happens rarely.
If checkpoint happens after some listening socket was closed,
but it left behind some children (including timewait buckets),
restore fails to bind them, unless the service used SO_REUSEADDR.
Stress checkpointing of LTP tests did not catch this earlier
only because... I repaired the tests not to fail upon exhaustion
of port space some time ago. Before that they failed with obvious
and harmless diagnosis long before the first binding conflict happened.
</div>
==== diff-cpt-ve-features-20060815 ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:
Feature set were not saved in CPT, so VEs based on SUSE template could
fail after restore (VE_FEATURE_SYSFS was lost). Save feature set in place
which were not used before (cpt_os_version and cpt_os_features fields in
image header).
</div>
==== diff-cpt-veth-support-20060616 ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:<br/>
[CPT] This patch adds veth support in CPT
</div>
==== diff-cpt-wait-task-cleanup-20060908 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[CPT] Remove ifdefs around wait_task_inactive()
</div>
==== diff-cpt-x86_64-debuginfo ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:<br/>
[CPT] fix compilation with CONFIG_DEBUG_INFO
Just #undef it.
</div>
==== diff-cpt-x8664-setpriority ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:<br/>
[CPT] process priority was restored incorrectly on x86_64
Ugly type casting bug. u32 was implicitly casted to long
and on 64bit archs negative nice values were rejected as
huge positive ones.
</div>
==== diff-dbg-show-top-slabs-20060727 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Show info about the largest kmem caches in OOM killer and SysRq-M handler.
</div>
==== diff-dbg-show-top-slabs-fix-20060727 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Show top slabs functionality comp fixes:
* lock w/o irqsave and flags;
* correct loop counter;
* names: objsize -> buffer_size.
</div>
==== diff-fairsched-cpuinfo-20060710 ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:<br/>
This patch virtualizes /proc/cpuinfo.
Added sysctl to scale or not cpu frequency inside VE.
</div>
==== diff-fairsched-cpuinfo-fix-20060908 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Add prototype for ve_scale_khz() in vsched.h (comp)
</div>
==== diff-fairsched-dynvcpus-20060623 ====
<div class="change">
Patch from Kirill Korotaev <dev@openvz.org>:<br/>
This patch adds new fairsched syscalls which allows to change number of
VCPUs inside VE dynamically on the fly.
TODO:
* per FS-node task list
* do_fairsched_vcpus: adjust rate
* __migrate_task doesn't return any error code and can fail
* empty flag in vcpu_del / synchronize optimization
* finish diff-cpuinfo
* /proc file with vcpus field?
</div>
==== diff-fairsched-iowait-20060525 ====
<div class="change">
Patch from Dmitry Mishin <dim@openvz.org>:<br/>
This patch fixes iowait_time statistics for both VE0 and VEs.
* removes redundant nr_iowait field in VE_CPU_STATS (bug noticed by Matt Loschert)
* after schedule task may be activated on the another processor.
Port on 2.6.16 by Xemul.
</div>
==== diff-fairsched-iowait-fix-20060809 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Compilation fix for nr_iowait_ve() modifications.
</div>
==== diff-fairsched-ppcfix-20060828 ====
<div class="change">
Patch from Kir Kolyshkin <kir@openvz.org>:<br/>
[PPC] fixes the mistype and the formatting in powerpc's show_regs().
</div>
==== diff-fairsched-ppc-syscalls-20060830 ====
<div class="change">
Patch from Kir Kolyshkin <kir@openvz.org>:<br/>
[PPC] adds fairsched syscalls for powerpc
</div>
==== diff-fairsched-sparse-fixes-20060915 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Cleanups in fairsched code found by sparse
* rq->push_vcpu = NULL;
* __user attribute in sysctl handler argument.
</div>
==== diff-merge-2.6.16.29-20060916 ====
<div class="change">
Patch from OpenVZ team <devel@openvz.org>:<br/>
Merged 2.6.16.29 from /linux/kernel/git/stable/linux-2.6.16.y
</div>
==== diff-ms-bind-mount-flags-20060816 ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:
This patch adds support of 3 mount flags to bind mount
Now we can do bind mounts with noexec, nosuid and nodev options
w/o need to do remount.
</div>
==== diff-ms-nf-compat-non-x86-20060831 ====
<div class="change">
Patch from Patrick McHardy <kaber@trash.net>:<br/>
[NETFILTER] x_tables: fix compat related crash on non-x86
When iptables userspace adds an ipt_standard_target, it calculates the size
of the entire entry as:
<source lang="c">
sizeof(struct ipt_entry) + XT_ALIGN(sizeof(struct ipt_standard_target))
</source>
ipt_standard_target looks like this:
<source lang="c">
struct xt_standard_target
{
struct xt_entry_target target;
int verdict;
};
</source>
xt_entry_target contains a pointer, so when compiled for 64 bit the
structure gets an extra 4 byte of padding at the end. On 32 bit
architectures where iptables aligns to 8 byte it will also have 4
byte padding at the end because it is only 36 bytes large.
The compat_ipt_standard_fn in the kernel adjusts the offsets by
<source lang="c">
sizeof(struct ipt_standard_target) -
sizeof(struct compat_ipt_standard_target),
</source>
which will always result in 4, even if the structure from userspace
was already padded to a multiple of 8. On x86 this works out by
accident because userspace only aligns to 4, on all other
architectures this is broken and causes incorrect adjustments to
the size and following offsets.
Thanks to Linus for lots of debugging help and testing.
Signed-off-by: Patrick McHardy <kaber@trash.net><br/>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
</div>
==== diff-powerpc-tif-freeze-20060905 ====
<div class="change">
Patch from Kir Kolyshkin <kir@openvz.org>:<br/>
[PPC] adds needed TIF_FREEZE define to powerpc
</div>
==== diff-softirqd-sparse-20041008 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Added __user attribute to sysctl handler's args in softirqd disabling code
Found by sparse.
</div>
==== diff-ubc-dcachecom-20060417 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
This patch fixes currently incorrect comments about locking in dcache.
</div>
==== diff-ubc-dcacheopt-20060906 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Dcachesize accounting optimization.
The accounting becomes conditional, and dentries start to be accounted only
when a given fraction of normal zone is consumed by dcache. On switching
accounting on and off, all dentries are walked in stop_machine and charged
to ub0 or uncharged.
Port for 2.6.16 by Pavel Emelianov <xemul@openvz.org>
</div>
==== diff-ubc-fileopt-20060504 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Main part of file accounting optimization.
* files are charged by quants;
* pre-charged but not used amount is kept in task_beancounter.
</div>
==== diff-ubc-fileopt-2-20060504 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Additional optimizations of file and kmemsize accounting, fixes.
* files are now charged to kmemsize explicitly, not through SLAB_UBC;
* certain amount of numfile and their kmemsize is precharged at fork;
* poll tables of small size are not charged at all;
* get_beancounter_batch and put_beancounter_batch are introduced to adjust refcounts at precharge/uncharge time, in batches, instead of at each allocation/deallocation.
</div>
==== diff-ubc-fileopt-fix-20060908 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Take kmem memory usage for file_cachep directly fro cachep.
</div>
==== diff-ubc-filkmemopt-20060907 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:
More file/kmemsize accounting fixes related to charges/uncharges
to wrong beancounters, as seen when testing optimisation.
</div>
==== diff-ubc-gfp-type-20060915 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[UBC] Use gfp_t type where appropriate in ub_mem.c
Found by sparse.
</div>
==== diff-ubc-kmemopt-20060504 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Start of kmemsize accounting optimization.
* kmemsize is accounted by quants;
* per-charged amounts are kept in task_beancounter for faster and lockless charge/uncharge operations.
</div>
==== diff-ubc-kmemopt-2-20060907 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
File and kmemsize accounting optimization fixes and improvements.
* missing uncharge added;
* a lot of likely/unlikely added;
* files are really charged into kmemsize;
* the problem of atomicity of per-task field is resolved by shifting irq_disable/enable around kmemsize charge calls.
</div>
==== diff-ubc-kmemopt-caches-20060504 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Another small but important optimization of kmemsize charges.
The maintenance of SLAB_UBC infrastructure is costly, so
kmalloc caches were duplicated, one for !SLAB_UBC allocations
and one for SLAB_UBC ones. Deallocations in the former avoid
the extra work of checking whether the object was charged.
</div>
==== diff-ubc-kmemopt-fix-20060908 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Typo in mm/slab.c after kmem optimisation patch port.
</div>
==== diff-ubc-kmem-ppcfix-20060828 ====
<div class="change">
Patch from Kir Kolyshkin <kir@openvz.org>:<br/>
[PPC] fixes the following compilation issue on ppc platform
<pre>
In file included from include/asm/tlb.h:20,
from arch/powerpc/platforms/pseries/lpar.c:37:
include/asm/pgalloc.h:97: error: conflicting types for '__pte_alloc'
include/linux/mm.h:819: error: previous declaration of '__pte_alloc' was
here
make[2]: *** [arch/powerpc/platforms/pseries/lpar.o] Error 1
</pre>
</div>
==== diff-ubc-net-locking-20060727 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
UBC socket buffers accounting locking fix.
All sock beancounters are stored in the list, starting at top beancounter,
and thus top's lock must be used to protect the list.
</div>
==== diff-ubc-net-sparse-cleanup-20060916 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[UBC] Cleanups in networking accounting
* remove unused gfp var from sock_alloc_send_skb2
* gfp_t type in ub_skb_alloc_bc()
</div>
==== diff-ubc-net-tcpsndbuf-charge-fix-20060721 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:
Network buffers (un)charging logic is
# work with top beancounter
# update all the rest witl (un)charge_beancounter_notop
In ub_sock_tcp_chargesend() it was broken (#65495)
</div>
==== diff-ubc-net-wait-mem-fix-20060823 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:
Return sk_stream_wait_memory() prototype to original state to make
inifiniband driver (and any other caller) compile.
Places that use new version call __sk_stream_wait_memory().
</div>
==== diff-ubc-notopinl-20060907 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:
Make (un)charge_xxx_notop functions inline
to avoid call and IRQ disabling for top beancounters.
Spotted in profiles by Den.
</div>
==== diff-ubc-nrfiles-memset-after-charge-fix-20060911 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Memset file to 0 before charging it to prevent f_ub erasing.
</div>
==== diff-ubc-nrfiles-opt-fix-20060911 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Fix of nrfiles accounting.
Since file_cachep is not SLAB_UBC after Andrey's optimisations
slab_ub(file) will BUG_ON inside slab_ub_ref.
Use file->f_ub instead.
</div>
==== diff-ubc-nrfiles-rcu-race-20060916 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[UBC] Fix UB_NUMFILE accounting optimisation leak
In 2.6.16 files are put via RCU, so ub_file_uncharge() is called
in IRQ context. Thus non-atomic decrement of file_precharged must
be done with IRQs disabled.
</div>
==== diff-ubc-ppc-syscalls-20060830 ====
<div class="change">
Patch from Kir Kolyshkin <kir@openvz.org>:<br/>
[PPC] fix ubc syscalls declaration for powerpc
</div>
==== diff-ubc-putwarn-20060525 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
This patch prints more sensible warning on bad refcounter in
__put_beancounter.
</div>
==== diff-ubc-skbufopt-20060512 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Various changes in socket buffer accounting.
* likely/unlikely added;
* internal code organization improved;
* skb->sk never follows for netlink sockets (it's almost always wrong);
* ub_wcharged and optimizations should never be used for netlink sockets.
</div>
==== diff-ubc-skbufopt-2-20060907 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:
This patch removes skb accounting speed-up for UNIX sockets.
It doesn't work (kfree_skb is called in a different socket's context).
Along with this, charge severity fixed in tcp_chargepage (#63650)
</div>
==== diff-ubc-syscalls-ppcfix-20060828 ====
<div class="change">
Patch from Kir Kolyshkin <kir@openvz.org>:<br/>
[PPC] asm-powerpc/unistd.h mistype fix
</div>
==== diff-ubc-tcppage-20060525 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:
This patch fixes an apparent bug in accounting in ub_sock_tcp_chargepage.
Should help problems at DefenderHosting.
</div>
==== diff-ubc-tcprcvopt-20060502 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Optimization of tcprcvbuf accounting.
Keep pre-charged amount in per-socket forw_space.
</div>
==== diff-ubc-tcpsndopt-20060906 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Tcpsndbuf optimization.
* Keep more in per-socket poll_reserve, do not hurry to return to beancounter if limits are high enough;
* Certain unification and streamlining of charge/uncharge functions;
minor: severity renamed to ub_severity, to keep this name in proper
namespace.
</div>
==== diff-ubc-twcount-20060907 ====
<div class="change">
Patch from Denis Lunev <den@openvz.org>:<br/>
Per-UB limitation to the number of TCP timewait buckets.
This is done to disallow to eat VE kernel memory by them completely.
Unfortunately, virtualized sysctl can't help, as TW buckets live after
actual VE death, so the counter on UB is used.
So, the number of TW buckets is limited by
* number of buckets allowed for a UB
* the fraction of kernel memory limit (in 1024th)
which one is reached first (#61789)
Ported on 2.6.16 by Xemul.
</div>
==== diff-ubc-twcount-fix-20060908 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Fix ub_timewait_check() to get kmem_cache objuse directly
and from correct slab.
</div>
==== diff-ubc-user-attribute-20060815 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[UBC] Added __user attribute to UBC syscalls arguments
Found by sparse.
</div>
==== diff-ubc-vmpages-ppcfix-20060828 ====
<div class="change">
Patch from Kir Kolyshkin <kir@openvz.org>:<br/>
[PPC] adding MAP_EXECPRIO define for powerpc
</div>
==== diff-ubc-writespace-20060616 ====
<div class="change">
Patch from Denis Lunev <den@openvz.org>:<br/>
Deadlock on beancounter lock.
sk_stream_write_space() sends signal to a task, so it can take a beancounter
lock. ub_tcp_snd_wakeup()/ub_sock_snd_wakeup() was called with a lock held.
</div>
==== diff-ve-kstat-sparse-20060815 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Use gfp_t type in __alloc_collect_stats()
Found by sparse.
</div>
==== diff-ve-memleak-fib-hash-20060828 ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:<br/>
[PATCH] memory leakage in fib_hash
FIB hash tables and zone structs were never freed.
Each time, when VE is stopped, they leak.
vzctl chkpnt/restore tests bring a system with 4G of ram quite soon.
Of course, vzctl start/stop is not so fast to bring down a system with
decent amount of RAM, but hundreds of thousands of slab entries are still
well visible.
The patch solves leakage in size-128 and most of leakage in size-64.
We still leak two objects in size-64 and 6 entries in size-32.
</div>
==== diff-ve-multi-cleanup-20060824 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:
Try to cleanup each VE in a separate thread.
This alows simultaneous stop of many VEs at once (#60673)
</div>
==== diff-venet-devprintk-20060719 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Better print message on promiscuous mode change by ve_printk (possible DoS?)
</div>
==== diff-ve-net-dev-sysctl-20060821 ====
<div class="change">
Patch from Dmitry Mishin <dim@openvz.org>:<br/>
This patch allows VE owner to use net.ipv4.conf.<net_device>.xxx sysctls.
Bug #66842.
</div>
==== diff-ve-net-fib-leak-fix-20060830 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Fix memory leak in case of CONFIG_VE_NETDEV=n
Do not create fib rules if we're not going to use them.
</div>
==== diff-ve-net-fib-sparse-20060916 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
void argument in declarations of fib_rules_create()/destroy()
</div>
==== diff-ve-net-loop-stat-20060821 ====
<div class="change">
Patch from Dmitry Mishin <dim@openvz.org>:<br/>
Virtualized loopback_stats
Bug #66571.
</div>
==== diff-ve-net-mtu-20060828 ====
<div class="change">
Patch from Dmitry Mishin <dim@openvz.org>:<br/>
MTU manipulations on VE's devices
* removed mtu restore logic for moved devices
* added posibility to set mtu > 1500 for veth devices (#66836)
</div>
==== diff-ve-net-rtcache-20060719 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:
Fix of broken virtualization of /proc/net/rt_cache.
Bug #65528.
</div>
==== diff-ve-net-rtflush-20060719 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Fix for accidently broken /proc/sys/net/ipv4/route/flush.
Fixes permissions as well.
</div>
==== diff-ve-net-veth-sparse-20060916 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[VETH] Add __user attribute to the 2nd copy_from_user()'s argument
Found by sparse.
</div>
==== diff-ve-nf-allocsize-20060727 ====
<div class="change">
Patch from Vasily Tarasov <vtaras@openvz.org>:
Since size can change in ipt_flush_table()
xt_free_table_info() will fail to free memory then.
{{Bug|191}}.<br/>
Bug #65721.
Port on 2.6.16 by Xemul.
</div>
==== diff-ve-nf-compat-ppc64-20060831 ====
<div class="change">
Patch from Dmitry Mishin <dim@openvz.org>:<br/>
[PPC] enabled usage of ip_tables compat layer on ppc64
</div>
==== diff-ve-portrange-20060907 ====
<div class="change">
Patch from Denis Lunev <den@openvz.org>:
This patch virtualizes ip_local_port_range sysctl to allow specification
of different port range for auto-binding inside VE.
</div>
==== diff-ve-sysfs-ptmxadd-20060907 ====
<div class="change">
Patch from Vasily Tarasov <vtaras@openvz.org>:<br/>
Add /sys/class/tty/ptmx device
It's necessary, 'cause otherwise udev doesn't create /dev/ptmx
{{Bug|243}}.
Ported patch from Umka by Vasily.
</div>
==== diff-ve-sysfs-ptmx-fix-20060908 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Add prototypes for init/fini_ve_tty_class() calls.
</div>
==== diff-ve-vecalls-sparse-20060915 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Cleanups in vecalls.c and vzcalluser.h
* C99 syntax in structures init;
* __user attribute where appropriate;
* pass NULL as pointer arg, not 0.
Also define an empty __user macro for userspace in vzcalluser.h
Found by sparse.
</div>
==== diff-ve-venet-sparse-20060916 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:
Small venet cleanups
* C99 syntax in structures declarations
* __user attribute
Found by sparse.
</div>
==== diff-ve-veowner-sparse-20060915 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:
Cleanups in veowner.c
* use C99 syntax in struct fields initialization;
* use NULL instead of 0 for pointer arg.
Found by sparse.
</div>
==== diff-ve-vpsdumpable-20060710 ====
<div class="change">
Patch from Denis Lunev <den@openvz.org>:
This patch fixes small potential information leak,
i.e. here we should protect against core dump of VE0 process not inside VE,
but against core dump of VE0 process inside VE filesystem.
So, lets prevent coredump of such process at all.
</div>
==== diff-ve-vzwdog-fix-20060908 ====
<div class="change">
Patch from Vasily Averin <vvs@openvz.org>:<br/>
/proc/interrupt file should be closed if kernel_thread() fails
Bug #68096.
</div>
==== diff-ve-vzwdog-sparse-20060916 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
vfs_read() wants the 2nd argument to have __user attribute
Found by sparse.
</div>
==== diff-vzdq-fmt-quota-20060608 ====
<div class="change">
Patch from Vasily Tarasov <vtaras@openvz.org>:<br/>
[VZDQ] OOPS due to vzquota format operations are not implemented.
If usual quota is launched it uses usual vfs_quota_on which utilize format
operations == NULL and it causes oops.
{{Bug|184}}.
</div>
==== diff-vzdq-nougid-compile-20060917 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[VZDQ] Compilation fix for CONFIG_VZ_QUOTA_UGID=n case
* ifdefs in a couple of places;
* moved some code out of compiled-out file;
* 'ifdef' instead of 'if defined'.
{{Bug|222}}.
</div>
==== diff-vzdq-quotaoff-EIO-20060705 ====
<div class="change">
Patch from Vasily Tarasov <vtaras@openvz.org>:<br/>
Turns off quota in spite of errors while syncing inodes.
Bug #65186.
</div>
==== diff-vzdq-sparse-20060916 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[VZDQ] Cleanups in vzquota code
* C99 syntax in structures initialization
* __user attribute where appropriate
Found by sparse.
</div>
==== diff-vzwdog-flat-mem-map-fix-20060828 ====
<div class="change">
Patch from Kir Kolyshkin <kir@openvz.org>:<br/>
Fixes vzwdog compilation in case CONFIG_FLAT_NODE_MEM_MAP is not set.
</div>
==== diff-vzwdog-irq-20060807 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Restore showing IRQ information in vzwdog.
</div>
</noinclude>
* UBC optimisations
* CPT fixes and updates from stable branch.
* Dynamic VCPU control
* VE mass stop speedup
* Loopback statistics
* Compilations fixes
* sysfs ptmx virtualization.
* Mainstream update up to 2.6.16.29.
* Code cleanups from sparse.
<includeonly>[[{{PAGENAME}}/changes#Patches|{{Long changelog message}}]]</includeonly><noinclude>
=== Patches ===
==== diff-cpt-annoying-printk ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:<br/>
[CPT] remove annoying printk
In 2.6.9 printk("=") in refrigerator() is commented out.
We should remove printk(">\n") in cpt. The code with comment
is not removed, but commented out to remember that we have to
return this, if the printk in refrigerator() is uncommented.
</div>
==== diff-cpt-asmlinkage ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:<br/>
[CPT] asmlinkage attribute was forgotten
This fixes CPT with CONFIG_REGPARAM compiled
</div>
==== diff-cpt-caps-fix-20060908 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[CPT] capabilities check fixes
* namespace->sem is replaced with namespace_sem;
* task->used_math is replaced with tsk_used_math().
</div>
==== diff-cpt-checks-20060908 ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:<br/>
This patch adds checking for unsupported CPT features.
</div>
==== diff-cpt-clone-zombie ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:<br/>
[CPT] restoring threads with tsk->fs==NULL
If a nptl thread is ptraced, it does not die immediately
and we can arrive to the state:
<pre>
parent
|
main_thread -----> thread1 [ptraced]
in TASK_ZOMBIE in TASK_ZOMBIE
</pre>
To restore such configuration we do kernel_thread(CLONE_SIGNAL)
in context of main_thread. But if it is exited, it has tsk->fs == NULL
and kernel oopes.
Suggested fix is very simple: we just attach temporary fs_struct
from init task of VE. Also, we have to delay initialization of
tsk->group_exit,
otherwise kernel will not allow us to clone.
This fix is pragmatic.
Better fix would be restructuring of restore to delay zombification
until the last stage of restore. I.e. we could restore all the tree
of alive processes with all the attributes of alive task (fs, mm etc).
And after it is complete, we could make one more pass and collect garbage
killing zombie tasks and clearing fs, mm etc. It would be cleaner
and safer, but requires too much of changes.
Bug #65219.
</div>
==== diff-cpt-core-misc-2 ====
<div class="change">
Removed hunks after optimisation patches
</div>
==== diff-cpt-func-declaration-fix-20060916 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[CPT] Don't leave function argument list empty - use 'void'
</div>
==== diff-cpt-ifindex-renumber-2.6.16-20060908 ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:
This patch adds renumbering of netdev->ifindex'es on restore process.
We can do this because network is suspended. All manipulations are
protected with rtnl_lock().
</div>
==== diff-cpt-iptables-path ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:<br/>
This patch fixes iptables save/restore on SUSE.
Bug #62837.
</div>
==== diff-cpt-kernel-dpath-20060908 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Export and unstatic __d_path() call for CPT capability checks.
</div>
==== diff-cpt-mm-eagain-20060817 ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:<br/>
In tests we can see message: "mm_struct is referenced outside"
After that message checkpoint fails.
It seems that this situation is legal, so checkpoint could be restarted.
So we return -EAGAIN to be able to restart checkpoint.
</div>
==== diff-cpt-net-ifindex-cleanup-20060908 ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:<br/>
This patch removes renumbering of ifindexes of venet and loopback
devices on restore.
</div>
==== diff-cpt-net-lock-20060727 ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:<br/>
Network devices list were not protected while checkpoining.
This patch adds necessary protection.
</div>
==== diff-cpt-ptraceme-20060908 ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:<br/>
[CPT] SMP race in detecting state of ptraced processes
When suspending VE, we test state of processes while they are
still running. It is not a bug: we have to verify for invalid state
before checkpointing, real state is saved after processes are scheduled
out.
The impact is that we can see process in a bad state, f.e. stopped
without any reasons. It is also not a bug, but this rersults in random
failures of checkpointing. The only way to fix this is to order updates
of state variables. The order is correct almost everywhere.
</div>
==== diff-cpt-restore-mnt-flags-20060831 ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:
Mount point's mnt_flags (noexec,nosuid,nodev) were omitted and
not restored correctly.
This patch should be applied with patch for bind mounts
in other case we should do the following:
<ol>
<li> Remove check for bind-mounts in do_remount() function
<li>Change procedure for restoring bind-mounts in next way:
<pre>
do_mount(bind);
do_remount(mnt_flags).
</pre>
</ol>
</div>
==== diff-cpt-rst-dir ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:<br/>
[CPT] do not keep open cwd while restore
>>From the viewpoint of CPT, cwd/root are very similar to an open
file, it is just pair dentry/mnt. Normally, when opening some file
we store it and its inode in special object cache to resolve opening
of the same inode, when some of its aliases (dentries) are deleted.
But it is useless for directories, which cannot be hardlinked ever.
And this consumes numfile UBC, so that restore can fail easily.
So, do not store cwd/root file, unless it is deleted. This does not
solve problem with restoring VE hitting numfiles, but relieves it a lot.
Now we can temporarily increase numfile limit while cpt/rst by 2 and
everything should be OK.
</div>
==== diff-cpt-rst-sigdfl-20060830 ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:
[CPT] save/restore even SIG_DFL handlers
Linux has a funny feature: when SA_ONESHOT signal resets
handler, flags are not set to default. And LTP tests verify
this pathology.
</div>
==== diff-cpt-suid-dumpable ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:<br/>
[CPT] restore mm->dumpable correctly
mm->dumpable is not boolean in >=2.6.9, but tri-state.
Just save and restore raw value.
</div>
==== diff-cpt-suspend-cleanup ====
<div class="change">
Patch from Kirill Korotaev <dev@openvz.org>:<br/>
Fix of compilation of diff-cpt-suspend-cleanup.
</div>
==== diff-cpt-susp-printk-20060908 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[CPT] Remove printk("|\n") from refrigerator.
(#55914)
</div>
==== diff-cpt-tcp-bind-bug-20060831 ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:<br/>
[CPT] tcp sockets were bind()ed incorrectly during restore
This case was totally missed. Fortunately, this happens rarely.
If checkpoint happens after some listening socket was closed,
but it left behind some children (including timewait buckets),
restore fails to bind them, unless the service used SO_REUSEADDR.
Stress checkpointing of LTP tests did not catch this earlier
only because... I repaired the tests not to fail upon exhaustion
of port space some time ago. Before that they failed with obvious
and harmless diagnosis long before the first binding conflict happened.
</div>
==== diff-cpt-ve-features-20060815 ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:
Feature set were not saved in CPT, so VEs based on SUSE template could
fail after restore (VE_FEATURE_SYSFS was lost). Save feature set in place
which were not used before (cpt_os_version and cpt_os_features fields in
image header).
</div>
==== diff-cpt-veth-support-20060616 ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:<br/>
[CPT] This patch adds veth support in CPT
</div>
==== diff-cpt-wait-task-cleanup-20060908 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[CPT] Remove ifdefs around wait_task_inactive()
</div>
==== diff-cpt-x86_64-debuginfo ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:<br/>
[CPT] fix compilation with CONFIG_DEBUG_INFO
Just #undef it.
</div>
==== diff-cpt-x8664-setpriority ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:<br/>
[CPT] process priority was restored incorrectly on x86_64
Ugly type casting bug. u32 was implicitly casted to long
and on 64bit archs negative nice values were rejected as
huge positive ones.
</div>
==== diff-dbg-show-top-slabs-20060727 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Show info about the largest kmem caches in OOM killer and SysRq-M handler.
</div>
==== diff-dbg-show-top-slabs-fix-20060727 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Show top slabs functionality comp fixes:
* lock w/o irqsave and flags;
* correct loop counter;
* names: objsize -> buffer_size.
</div>
==== diff-fairsched-cpuinfo-20060710 ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:<br/>
This patch virtualizes /proc/cpuinfo.
Added sysctl to scale or not cpu frequency inside VE.
</div>
==== diff-fairsched-cpuinfo-fix-20060908 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Add prototype for ve_scale_khz() in vsched.h (comp)
</div>
==== diff-fairsched-dynvcpus-20060623 ====
<div class="change">
Patch from Kirill Korotaev <dev@openvz.org>:<br/>
This patch adds new fairsched syscalls which allows to change number of
VCPUs inside VE dynamically on the fly.
TODO:
* per FS-node task list
* do_fairsched_vcpus: adjust rate
* __migrate_task doesn't return any error code and can fail
* empty flag in vcpu_del / synchronize optimization
* finish diff-cpuinfo
* /proc file with vcpus field?
</div>
==== diff-fairsched-iowait-20060525 ====
<div class="change">
Patch from Dmitry Mishin <dim@openvz.org>:<br/>
This patch fixes iowait_time statistics for both VE0 and VEs.
* removes redundant nr_iowait field in VE_CPU_STATS (bug noticed by Matt Loschert)
* after schedule task may be activated on the another processor.
Port on 2.6.16 by Xemul.
</div>
==== diff-fairsched-iowait-fix-20060809 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Compilation fix for nr_iowait_ve() modifications.
</div>
==== diff-fairsched-ppcfix-20060828 ====
<div class="change">
Patch from Kir Kolyshkin <kir@openvz.org>:<br/>
[PPC] fixes the mistype and the formatting in powerpc's show_regs().
</div>
==== diff-fairsched-ppc-syscalls-20060830 ====
<div class="change">
Patch from Kir Kolyshkin <kir@openvz.org>:<br/>
[PPC] adds fairsched syscalls for powerpc
</div>
==== diff-fairsched-sparse-fixes-20060915 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Cleanups in fairsched code found by sparse
* rq->push_vcpu = NULL;
* __user attribute in sysctl handler argument.
</div>
==== diff-merge-2.6.16.29-20060916 ====
<div class="change">
Patch from OpenVZ team <devel@openvz.org>:<br/>
Merged 2.6.16.29 from /linux/kernel/git/stable/linux-2.6.16.y
</div>
==== diff-ms-bind-mount-flags-20060816 ====
<div class="change">
Patch from Andrey Mirkin <amirkin@openvz.org>:
This patch adds support of 3 mount flags to bind mount
Now we can do bind mounts with noexec, nosuid and nodev options
w/o need to do remount.
</div>
==== diff-ms-nf-compat-non-x86-20060831 ====
<div class="change">
Patch from Patrick McHardy <kaber@trash.net>:<br/>
[NETFILTER] x_tables: fix compat related crash on non-x86
When iptables userspace adds an ipt_standard_target, it calculates the size
of the entire entry as:
<source lang="c">
sizeof(struct ipt_entry) + XT_ALIGN(sizeof(struct ipt_standard_target))
</source>
ipt_standard_target looks like this:
<source lang="c">
struct xt_standard_target
{
struct xt_entry_target target;
int verdict;
};
</source>
xt_entry_target contains a pointer, so when compiled for 64 bit the
structure gets an extra 4 byte of padding at the end. On 32 bit
architectures where iptables aligns to 8 byte it will also have 4
byte padding at the end because it is only 36 bytes large.
The compat_ipt_standard_fn in the kernel adjusts the offsets by
<source lang="c">
sizeof(struct ipt_standard_target) -
sizeof(struct compat_ipt_standard_target),
</source>
which will always result in 4, even if the structure from userspace
was already padded to a multiple of 8. On x86 this works out by
accident because userspace only aligns to 4, on all other
architectures this is broken and causes incorrect adjustments to
the size and following offsets.
Thanks to Linus for lots of debugging help and testing.
Signed-off-by: Patrick McHardy <kaber@trash.net><br/>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
</div>
==== diff-powerpc-tif-freeze-20060905 ====
<div class="change">
Patch from Kir Kolyshkin <kir@openvz.org>:<br/>
[PPC] adds needed TIF_FREEZE define to powerpc
</div>
==== diff-softirqd-sparse-20041008 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Added __user attribute to sysctl handler's args in softirqd disabling code
Found by sparse.
</div>
==== diff-ubc-dcachecom-20060417 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
This patch fixes currently incorrect comments about locking in dcache.
</div>
==== diff-ubc-dcacheopt-20060906 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Dcachesize accounting optimization.
The accounting becomes conditional, and dentries start to be accounted only
when a given fraction of normal zone is consumed by dcache. On switching
accounting on and off, all dentries are walked in stop_machine and charged
to ub0 or uncharged.
Port for 2.6.16 by Pavel Emelianov <xemul@openvz.org>
</div>
==== diff-ubc-fileopt-20060504 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Main part of file accounting optimization.
* files are charged by quants;
* pre-charged but not used amount is kept in task_beancounter.
</div>
==== diff-ubc-fileopt-2-20060504 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Additional optimizations of file and kmemsize accounting, fixes.
* files are now charged to kmemsize explicitly, not through SLAB_UBC;
* certain amount of numfile and their kmemsize is precharged at fork;
* poll tables of small size are not charged at all;
* get_beancounter_batch and put_beancounter_batch are introduced to adjust refcounts at precharge/uncharge time, in batches, instead of at each allocation/deallocation.
</div>
==== diff-ubc-fileopt-fix-20060908 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Take kmem memory usage for file_cachep directly fro cachep.
</div>
==== diff-ubc-filkmemopt-20060907 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:
More file/kmemsize accounting fixes related to charges/uncharges
to wrong beancounters, as seen when testing optimisation.
</div>
==== diff-ubc-gfp-type-20060915 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[UBC] Use gfp_t type where appropriate in ub_mem.c
Found by sparse.
</div>
==== diff-ubc-kmemopt-20060504 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Start of kmemsize accounting optimization.
* kmemsize is accounted by quants;
* per-charged amounts are kept in task_beancounter for faster and lockless charge/uncharge operations.
</div>
==== diff-ubc-kmemopt-2-20060907 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
File and kmemsize accounting optimization fixes and improvements.
* missing uncharge added;
* a lot of likely/unlikely added;
* files are really charged into kmemsize;
* the problem of atomicity of per-task field is resolved by shifting irq_disable/enable around kmemsize charge calls.
</div>
==== diff-ubc-kmemopt-caches-20060504 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Another small but important optimization of kmemsize charges.
The maintenance of SLAB_UBC infrastructure is costly, so
kmalloc caches were duplicated, one for !SLAB_UBC allocations
and one for SLAB_UBC ones. Deallocations in the former avoid
the extra work of checking whether the object was charged.
</div>
==== diff-ubc-kmemopt-fix-20060908 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Typo in mm/slab.c after kmem optimisation patch port.
</div>
==== diff-ubc-kmem-ppcfix-20060828 ====
<div class="change">
Patch from Kir Kolyshkin <kir@openvz.org>:<br/>
[PPC] fixes the following compilation issue on ppc platform
<pre>
In file included from include/asm/tlb.h:20,
from arch/powerpc/platforms/pseries/lpar.c:37:
include/asm/pgalloc.h:97: error: conflicting types for '__pte_alloc'
include/linux/mm.h:819: error: previous declaration of '__pte_alloc' was
here
make[2]: *** [arch/powerpc/platforms/pseries/lpar.o] Error 1
</pre>
</div>
==== diff-ubc-net-locking-20060727 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
UBC socket buffers accounting locking fix.
All sock beancounters are stored in the list, starting at top beancounter,
and thus top's lock must be used to protect the list.
</div>
==== diff-ubc-net-sparse-cleanup-20060916 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[UBC] Cleanups in networking accounting
* remove unused gfp var from sock_alloc_send_skb2
* gfp_t type in ub_skb_alloc_bc()
</div>
==== diff-ubc-net-tcpsndbuf-charge-fix-20060721 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:
Network buffers (un)charging logic is
# work with top beancounter
# update all the rest witl (un)charge_beancounter_notop
In ub_sock_tcp_chargesend() it was broken (#65495)
</div>
==== diff-ubc-net-wait-mem-fix-20060823 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:
Return sk_stream_wait_memory() prototype to original state to make
inifiniband driver (and any other caller) compile.
Places that use new version call __sk_stream_wait_memory().
</div>
==== diff-ubc-notopinl-20060907 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:
Make (un)charge_xxx_notop functions inline
to avoid call and IRQ disabling for top beancounters.
Spotted in profiles by Den.
</div>
==== diff-ubc-nrfiles-memset-after-charge-fix-20060911 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Memset file to 0 before charging it to prevent f_ub erasing.
</div>
==== diff-ubc-nrfiles-opt-fix-20060911 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Fix of nrfiles accounting.
Since file_cachep is not SLAB_UBC after Andrey's optimisations
slab_ub(file) will BUG_ON inside slab_ub_ref.
Use file->f_ub instead.
</div>
==== diff-ubc-nrfiles-rcu-race-20060916 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[UBC] Fix UB_NUMFILE accounting optimisation leak
In 2.6.16 files are put via RCU, so ub_file_uncharge() is called
in IRQ context. Thus non-atomic decrement of file_precharged must
be done with IRQs disabled.
</div>
==== diff-ubc-ppc-syscalls-20060830 ====
<div class="change">
Patch from Kir Kolyshkin <kir@openvz.org>:<br/>
[PPC] fix ubc syscalls declaration for powerpc
</div>
==== diff-ubc-putwarn-20060525 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
This patch prints more sensible warning on bad refcounter in
__put_beancounter.
</div>
==== diff-ubc-skbufopt-20060512 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Various changes in socket buffer accounting.
* likely/unlikely added;
* internal code organization improved;
* skb->sk never follows for netlink sockets (it's almost always wrong);
* ub_wcharged and optimizations should never be used for netlink sockets.
</div>
==== diff-ubc-skbufopt-2-20060907 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:
This patch removes skb accounting speed-up for UNIX sockets.
It doesn't work (kfree_skb is called in a different socket's context).
Along with this, charge severity fixed in tcp_chargepage (#63650)
</div>
==== diff-ubc-syscalls-ppcfix-20060828 ====
<div class="change">
Patch from Kir Kolyshkin <kir@openvz.org>:<br/>
[PPC] asm-powerpc/unistd.h mistype fix
</div>
==== diff-ubc-tcppage-20060525 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:
This patch fixes an apparent bug in accounting in ub_sock_tcp_chargepage.
Should help problems at DefenderHosting.
</div>
==== diff-ubc-tcprcvopt-20060502 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Optimization of tcprcvbuf accounting.
Keep pre-charged amount in per-socket forw_space.
</div>
==== diff-ubc-tcpsndopt-20060906 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Tcpsndbuf optimization.
* Keep more in per-socket poll_reserve, do not hurry to return to beancounter if limits are high enough;
* Certain unification and streamlining of charge/uncharge functions;
minor: severity renamed to ub_severity, to keep this name in proper
namespace.
</div>
==== diff-ubc-twcount-20060907 ====
<div class="change">
Patch from Denis Lunev <den@openvz.org>:<br/>
Per-UB limitation to the number of TCP timewait buckets.
This is done to disallow to eat VE kernel memory by them completely.
Unfortunately, virtualized sysctl can't help, as TW buckets live after
actual VE death, so the counter on UB is used.
So, the number of TW buckets is limited by
* number of buckets allowed for a UB
* the fraction of kernel memory limit (in 1024th)
which one is reached first (#61789)
Ported on 2.6.16 by Xemul.
</div>
==== diff-ubc-twcount-fix-20060908 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Fix ub_timewait_check() to get kmem_cache objuse directly
and from correct slab.
</div>
==== diff-ubc-user-attribute-20060815 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[UBC] Added __user attribute to UBC syscalls arguments
Found by sparse.
</div>
==== diff-ubc-vmpages-ppcfix-20060828 ====
<div class="change">
Patch from Kir Kolyshkin <kir@openvz.org>:<br/>
[PPC] adding MAP_EXECPRIO define for powerpc
</div>
==== diff-ubc-writespace-20060616 ====
<div class="change">
Patch from Denis Lunev <den@openvz.org>:<br/>
Deadlock on beancounter lock.
sk_stream_write_space() sends signal to a task, so it can take a beancounter
lock. ub_tcp_snd_wakeup()/ub_sock_snd_wakeup() was called with a lock held.
</div>
==== diff-ve-kstat-sparse-20060815 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Use gfp_t type in __alloc_collect_stats()
Found by sparse.
</div>
==== diff-ve-memleak-fib-hash-20060828 ====
<div class="change">
Patch from Alexey Kuznetsov <alexey@openvz.org>:<br/>
[PATCH] memory leakage in fib_hash
FIB hash tables and zone structs were never freed.
Each time, when VE is stopped, they leak.
vzctl chkpnt/restore tests bring a system with 4G of ram quite soon.
Of course, vzctl start/stop is not so fast to bring down a system with
decent amount of RAM, but hundreds of thousands of slab entries are still
well visible.
The patch solves leakage in size-128 and most of leakage in size-64.
We still leak two objects in size-64 and 6 entries in size-32.
</div>
==== diff-ve-multi-cleanup-20060824 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:
Try to cleanup each VE in a separate thread.
This alows simultaneous stop of many VEs at once (#60673)
</div>
==== diff-venet-devprintk-20060719 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Better print message on promiscuous mode change by ve_printk (possible DoS?)
</div>
==== diff-ve-net-dev-sysctl-20060821 ====
<div class="change">
Patch from Dmitry Mishin <dim@openvz.org>:<br/>
This patch allows VE owner to use net.ipv4.conf.<net_device>.xxx sysctls.
Bug #66842.
</div>
==== diff-ve-net-fib-leak-fix-20060830 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Fix memory leak in case of CONFIG_VE_NETDEV=n
Do not create fib rules if we're not going to use them.
</div>
==== diff-ve-net-fib-sparse-20060916 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
void argument in declarations of fib_rules_create()/destroy()
</div>
==== diff-ve-net-loop-stat-20060821 ====
<div class="change">
Patch from Dmitry Mishin <dim@openvz.org>:<br/>
Virtualized loopback_stats
Bug #66571.
</div>
==== diff-ve-net-mtu-20060828 ====
<div class="change">
Patch from Dmitry Mishin <dim@openvz.org>:<br/>
MTU manipulations on VE's devices
* removed mtu restore logic for moved devices
* added posibility to set mtu > 1500 for veth devices (#66836)
</div>
==== diff-ve-net-rtcache-20060719 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:
Fix of broken virtualization of /proc/net/rt_cache.
Bug #65528.
</div>
==== diff-ve-net-rtflush-20060719 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Fix for accidently broken /proc/sys/net/ipv4/route/flush.
Fixes permissions as well.
</div>
==== diff-ve-net-veth-sparse-20060916 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[VETH] Add __user attribute to the 2nd copy_from_user()'s argument
Found by sparse.
</div>
==== diff-ve-nf-allocsize-20060727 ====
<div class="change">
Patch from Vasily Tarasov <vtaras@openvz.org>:
Since size can change in ipt_flush_table()
xt_free_table_info() will fail to free memory then.
{{Bug|191}}.<br/>
Bug #65721.
Port on 2.6.16 by Xemul.
</div>
==== diff-ve-nf-compat-ppc64-20060831 ====
<div class="change">
Patch from Dmitry Mishin <dim@openvz.org>:<br/>
[PPC] enabled usage of ip_tables compat layer on ppc64
</div>
==== diff-ve-portrange-20060907 ====
<div class="change">
Patch from Denis Lunev <den@openvz.org>:
This patch virtualizes ip_local_port_range sysctl to allow specification
of different port range for auto-binding inside VE.
</div>
==== diff-ve-sysfs-ptmxadd-20060907 ====
<div class="change">
Patch from Vasily Tarasov <vtaras@openvz.org>:<br/>
Add /sys/class/tty/ptmx device
It's necessary, 'cause otherwise udev doesn't create /dev/ptmx
{{Bug|243}}.
Ported patch from Umka by Vasily.
</div>
==== diff-ve-sysfs-ptmx-fix-20060908 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Add prototypes for init/fini_ve_tty_class() calls.
</div>
==== diff-ve-vecalls-sparse-20060915 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
Cleanups in vecalls.c and vzcalluser.h
* C99 syntax in structures init;
* __user attribute where appropriate;
* pass NULL as pointer arg, not 0.
Also define an empty __user macro for userspace in vzcalluser.h
Found by sparse.
</div>
==== diff-ve-venet-sparse-20060916 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:
Small venet cleanups
* C99 syntax in structures declarations
* __user attribute
Found by sparse.
</div>
==== diff-ve-veowner-sparse-20060915 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:
Cleanups in veowner.c
* use C99 syntax in struct fields initialization;
* use NULL instead of 0 for pointer arg.
Found by sparse.
</div>
==== diff-ve-vpsdumpable-20060710 ====
<div class="change">
Patch from Denis Lunev <den@openvz.org>:
This patch fixes small potential information leak,
i.e. here we should protect against core dump of VE0 process not inside VE,
but against core dump of VE0 process inside VE filesystem.
So, lets prevent coredump of such process at all.
</div>
==== diff-ve-vzwdog-fix-20060908 ====
<div class="change">
Patch from Vasily Averin <vvs@openvz.org>:<br/>
/proc/interrupt file should be closed if kernel_thread() fails
Bug #68096.
</div>
==== diff-ve-vzwdog-sparse-20060916 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
vfs_read() wants the 2nd argument to have __user attribute
Found by sparse.
</div>
==== diff-vzdq-fmt-quota-20060608 ====
<div class="change">
Patch from Vasily Tarasov <vtaras@openvz.org>:<br/>
[VZDQ] OOPS due to vzquota format operations are not implemented.
If usual quota is launched it uses usual vfs_quota_on which utilize format
operations == NULL and it causes oops.
{{Bug|184}}.
</div>
==== diff-vzdq-nougid-compile-20060917 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[VZDQ] Compilation fix for CONFIG_VZ_QUOTA_UGID=n case
* ifdefs in a couple of places;
* moved some code out of compiled-out file;
* 'ifdef' instead of 'if defined'.
{{Bug|222}}.
</div>
==== diff-vzdq-quotaoff-EIO-20060705 ====
<div class="change">
Patch from Vasily Tarasov <vtaras@openvz.org>:<br/>
Turns off quota in spite of errors while syncing inodes.
Bug #65186.
</div>
==== diff-vzdq-sparse-20060916 ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:<br/>
[VZDQ] Cleanups in vzquota code
* C99 syntax in structures initialization
* __user attribute where appropriate
Found by sparse.
</div>
==== diff-vzwdog-flat-mem-map-fix-20060828 ====
<div class="change">
Patch from Kir Kolyshkin <kir@openvz.org>:<br/>
Fixes vzwdog compilation in case CONFIG_FLAT_NODE_MEM_MAP is not set.
</div>
==== diff-vzwdog-irq-20060807 ====
<div class="change">
Patch from Andrey Savochkin <saw@openvz.org>:<br/>
Restore showing IRQ information in vzwdog.
</div>
</noinclude>