6,534
edits
Changes
created
== Changes ==
* CPT fixes
* compat iptables fixes
* dead state tasks leak fixed
* Fixes for broken teamspeak application
* ext3 fixes
* Microcode update fix
* UBC dcache fix
* OOM killer fix
* Fixes for compilation with gcc4
* Mainstream bridges security fix
* /proc/cpuinfo Mhz output fix.
* Security fix for local port range.
* Old vzdq detached inode warning fix.
* Other fixes.
=== Configs ===
* +<code>CONFIG_EXT2_FS_XATTR=y</code>
* +<code>CONFIG_EXT2_FS_POSIX_ACL=y</code>
* +<code>CONFIG_EXT3_FS_POSIX_ACL=y</code>
* +<code>CONFIG_REISERFS_FS_XATTR=y</code>
* +<code>CONFIG_REISERFS_FS_POSIX_ACL=y</code>
* +<code>CONFIG_FS_POSIX_ACL=y</code>
* +<code>CONFIG_NFS_V3_ACL=y</code>
* +<code>CONFIG_NFSD_V2_ACL=y</code>
* +<code>CONFIG_NFSD_V3_ACL=y</code>
* +<code>CONFIG_KOBJECT_UEVENT=y</code>
=== Compatibility ===
* In-kernel sysfs/uevent layer is now updated to be compatible with FC5 and SLES10 userland.
<includeonly>[[{{PAGENAME}}/changes#Patches|{{Long changelog message}}]]</includeonly><noinclude>
=== Patches ===
==== diff-vzdq-dbg-detached-20061113 ====
<div class="change">
Patch from Alexey Dobriyan:
Debug patch for resolving OVZ bugs {{b|341}}, {{b|116}}, {{b|177}}.
* Extend ->origin into array for previous origin tracking.
* Print last two origins in debugging message.
* Also print i_mode, i_op, i_fop, ... of the offending inode.
</div>
==== diff-ms-sched-migrate-ints ====
<div class="change">
Patch from Kirill:
move_task_off_dead_cpu() requires interrupts to be disabled,
while migrate_dead() calls it with enabled interrupts.
Added appropriate comments to functions and added
BUG_ON(!irqs_disabled()) into double_rq_lock() and
double_lock_balance() which are the real source of such bugs.
Signed-Off-By: Kirill Korotaev <dev@sw.ru>
</div>
==== diff-ms-oomhang-20061113 ====
<div class="change">
Patch from Denis:<br/>
This patch fixes the following problem:
if a process is being killed inside __alloc_pages by OOM, it can't
exit till it frees some space, which is impossible for now. The patch
allows to dig into reserves for such a process.
Bugs #71604, #71179.
</div>
==== diff-ubc-oomkill-b-20061113 ====
<div class="change">
Patch from Alexey:
gcc complains about scanned being used uninitialized and it's right.
</div>
==== diff-ubc-vmrss-fix-20060921 ====
<div class="change">
Patch from Pavel:
Fix negative vm_rss accounting.
Bug #71680.
</div>
==== diff-ve-vecalls-cleanup-20061108 ====
<div class="change">
Patch from Andrey:
Some file system used inside VE do not have mount point (nfs, fuse).
We do not need to perform checks and umount for them.
Existing check is relaxed and umount is performed conditionally.
</div>
==== diff-ve-netdev-move-20061107 ====
<div class="change">
Patch from Andrey:
When we moving network device from VE0 we must check that VE doesn't have
device with the same name (e.g. we can create veth device with name eth0
inside VE and try to move eth0 device from VE0 to this VE).
</div>
==== diff-ms-kobject-uevents-20061027 ====
<div class="change">
Patch from Roman Chechnev:
Implementation of userspace events
Summary patches and changes:
* export of SEQNUM to userspace (creates /sys/kernel)
* kobject: adjust hotplug_seqnum increment to keep userspace and kernel agreeing.
* kobject: fix build error if CONFIG_HOTPLUG is not enabled.
* kobject: hotplug_seqnum is not 64 bits on all platforms, so fix it.
* ksyms: don't implement /sys/kernel/hotplug_seqnum if CONFIG_HOTPLUG is not enabled.
* Implemetation of userspace events through a netlink socket
* kobject_uevent warning fix
* kobject_uevent: fix init ordering
* kevent: standardize on the event types
* kobject: add CONFIG_DEBUG_KOBJECT
* kevent: add block mount and umount support
* kobject: add add_hotplug_env_var()
* kevent: add __bitwise kobject_action to help the compiler check for misusages
* Make kobject_hotplug() work even if the kobject's kset doesn't implement any
* kobject_uevent warning fix
* hotplug: prevent skips in sequence number from happening
* kobject: fix hotplug bug with seqnum
* take me home, hotplug_path[]
* Move hotplug_path[] out of kmod.[ch] to kobject_uevent.[ch]
* kevent: fix build error if CONFIG_KOBJECT_UEVENT is not selected.
* USB: use add_hotplug_env_var() in core/usb.c
* Use add_hotplug_env_var() in firmware loader
* fix unnecessary increment in firmware_class_hotplug() and USB core
* drivers/usb/core/usb.c: add MODALIAS env var to hotplug
* usb: class driver pass dev_t to the class core
* PCI: add MODALIAS to hotplug event for pci devices
* PCI: Remove newline from pci MODALIAS variable
* PCI: move pci core to use add_hotplug_env_var()
* add the physical device and the bus to the hotplug environment
* add the driver name to the hotplug environment
* driver core: allow struct bin_attributes in class devices
* class_simple: pass dev_t to the class core
* class core: export MAJOR/MINOR to the hotplug env
* block: genhd: terminate, set to next free slot, shrink available space
* avoid problems with kobject_set_name and name with %
* Hotplug: Make dev->bus checking consistent
* Driver core: add driver symlink to device
* add the bus name to the hotplug environment
* Driver core: add "bus" symlink to class/block devices
* add sysfs attr to re-emit device hotplug event
</div>
==== diff-ubc-precharge-irqs ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:
[UBC] Fix UB_NUMFILE accounting optimisation leak
In 2.6.16 files are put via RCU, so ub_file_uncharge() is called
in IRQ context. Thus non-atomic decrement of file_precharged must
be done with IRQs disabled.
</div>
==== diff-ubc-precharged-kmemsize ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:
[UBC] Don't allow precharged files exhaust kmemsize
When file is put it may be added to precharged value to some
task thus holding UB_NUMFULE and UB_KMEMSIZE resources.
The problem is that files do not start uncharging till
ub_barrier_farnr() is hit for UB_NUMFILE. For ub0
ub_barrier_farnr() can happen only after hitting kmemsize
barrier. Thus kmemsize reurce gets completely exhausted.
On 2.6.9 this problem is not easyli reproducible as files
are put in the context of closing task usually. On 2.6.16
files are put via RCU and thus - in other task's context.
[http://forum.openvz.org/index.php?t=msg&th=1243&#msg_6933 http://forum.openvz.org/index.php?t=msg&th=1243&#msg_6933]
</div>
==== diff-ubc-ub0-precharge-init ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:
[UBC] Set correct precharge values for init_task.
Otherwise file freeing will happen in "swapper" context
and will spoil all statistics due to "negative" unsigned
long value.
{{bug|322}}.
</div>
==== diff-mainstream-elfzerobss-20050908 ====
<div class="change">
Patch from mainstream:
[PATCH] binfmt_elf: clearing bss may fail
So we discover that Borland's Kylix application builder emits weird elf
files which describe a non-writeable bss segment.
So remove the clear_user() check at the place where we zero out the bss. I
don't _think_ there are any security implications here (plus we've never
checked that clear_user() return value, so whoops if it is a problem).
Signed-off-by: Pavel Machek <pavel@suse.cz><br/>
Signed-off-by: Andrew Morton <akpm@osdl.org><br/>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
{{bug|332}}.
</div>
==== diff-cpt-rst-deleted-owner-b-20061103 ====
<div class="change">
Patch from Andrey:
Wrong structure with inode attributes were used in fixup_file_content().
We must be sure that we won't broke file system, thus we do not set inode mode
attributes from S_IFMT mask.
Fix for bugs:<br/>
Bug #71135.<br/>
Bug #71161.
</div>
==== diff-cpt-rst-deleted-owner-b-20061102 ====
<div class="change">
Patch from Andrey:
Due to silly mistake wrong inode mode were set on restore (cpt_mode were used
instead of cpt_i_mode).
This patch should be used instead of diff-cpt-rst-deleted-owner-b-20061030.
Fix for bugs:<br/>
Bug #71135.<br/>
Bug #71161.
</div>
==== diff-ve-proc-entry-threads-20061025 ====
<div class="change">
Patch from Kostja:
don't create /proc/<PID> dentry/inode for threads that are not group leaders.
Sets /proc/<TGID>/task/<PID> dentry as a task->proc_dentry for such threads.
Bug #69536.
This should eliminate the problem of growing number of processes in X state.
</div>
==== diff-ubc-debug-atomic-20061020 ====
<div class="change">
Patch from Pavel:
Optimized kmemsize accounting calls ub_slab_(ub)charge
with IRQs disabled, but debugging code isn't aware of it...
Bug #70694.
</div>
==== diff-ve-veth-del-fix-b-20061024 ====
<div class="change">
Patch from Andrey:
Context should be set from device (dev->owner_env) even for devices from VE0.
</div>
==== diff-cpt-external-file ====
<div class="change">
Patch from Alexey:
[CPT] fail when VE refers to an invisible file
Checkpointing used to ignore EINVAL returned by d_path().
It was workaround for tmpfs shmem files, which use detached mounts.
But this means that real invisible paths are detected too late:
checkpointing succeds and restore fails, which is not acceptable.
When d_path() fails, check that it is shmem file.
If it is not, fail immediately.
</div>
==== diff-ms-nf-compat-copy-err-20061030 ====
<div class="change">
Patch from Dmitry:
Fixed matches modules refcount in case of error in compat_copy_from_user()
</div>
==== diff-ms-lost-routes ====
<div class="change">
Patch from mainstream:
[IPV4]: Fix lost routes in fn_hash netlink dumps.
Spotted by itkes@fat.imed.msu.ru, the fn_hash_dump_bucket() main
loop does not increment 'i' properly, and thus routes will not
be listed, when the test 'i < s_i' passes.
The bug was added when the code was converted over to
hlist_for_each_entry() by your's truly.
Signed-off-by: David S. Miller <davem@davemloft.net>
</div>
==== diff-*-gcc4-compile* ====
diff-ide-taskfile-gcc4-20061106<br/>
diff-i2c-gcc4-compilefix-20051103<br/>
diff-acenic-gcc4-compile-20061107<br/>
diff-adp94xx-gcc4-compile-20061107<br/>
diff-adp94xx-gcc4-compile2-20061107<br/>
diff-adp94xx-gcc4-compile3-20061107<br/>
diff-ia64-sn-gcc4-compile-20061108:<br/>
<div class="change">
Patches from Kir Kolyshkin <kir@openvz.org>:
Various gcc4-related compilation fixes.
</div>
==== diff-cpt-external-pgids ====
<div class="change">
Patch from Alexey:
[CPT] do not checkpoint/restore global process groups
The patch is three-fold:
1. Do not try to allocate process groups/sessions, unless they
are not virtual. This is fix for bug #71825.
However, it is too late to detect failure.
2. Do not checkpoint VE, if it contains references to extenal process
groups/session ids. It is _destructive_ part. It definitely will
prevent migration of some commonly used configurations, when
some deficient daemon (sort of qmail) forgets to daemonize itself
and it is started by vzctl exec.
Workaround is possible in theory at level of vzctl, if it makes
the second fork and setsid() after VE_ENTER. It is not impossible,
because entered process is not required to be child of vzctl, actual
reaping and waiting is done not by wait4(), but with control pipe.
Another way is to use clone(CLONE_PARENT), but it is also tricky.
3. Do the same checks before migration started to prevent failure due to #2 after rsync phase.
</div>
==== diff-sysctl-nums-20061121 ====
<div class="change">
Patch from Alexandr Andreev:
fix misprint (error) in sysctl numbers.
</div>
==== diff-cpt-conntrack-version-2 ====
<div class="change">
Patch from Alexey:
[CPT] checkpoint/restore conntrack on 2.6.9
Comparing to 2.6.8, 2.6.9 tracks much more information about
TCP connections. We need to reserve additional space for it.
Right now it still does not make much of sense to add additional
attribute just because it is already known that in 2.6.16..19
conntrack is even more different.
So, we just change format of all the record and try to fill missing
field with some reasonable values, when migrating from 2.6.8.
</div>
==== diff-security-bridges-fdb-entries ====
<div class="change">
Patch from mainstream:
bridge: fix possible overflow in get_fdb_entries (CVE-2006-5751)
Make sure to properly clamp maxnum to avoid overflow (CVE-2006-5751).
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
</div>
==== diff-ext3-retries4 ====
<div class="change">
Patch from Andrey with fixes from Dmitry Monakhov:
in journal=ordered or journal=data mode retry in ext3_prepare_write()
breaks the requirements of journaling of data with respect to metadata.
The fix is to call commit_write to commit allocated zero blocks before
retry.
Author: Andrey Savochkin <saw@sw.ru><br/>
Signed-Off-By: Kirill Korotaev <dev@openvz.org>
</div>
==== diff-ext3-pgfault11b ====
<div class="change">
Patch from Dmitry (dmonakhov@), modified by Kirill:<br/>
This patch fixes issues introduced by diff-ext3-pgfault11 patch<br/>
* remove unused variables<br/>
* fix incorrect recursion detection
Bug #71881.
</div>
==== diff-sysrq-debugging-c-20061129 ====
<div class="change">
Patch from Alexandr Andreev:
SysRq debugger memory dumping enhancements:<br/>
* 32/64 bit architectures support (80 chars per line);<br/>
* skipping lines with zero bytes;
</div>
==== diff-security-ptrace-race-20061129 ====
<div class="change">
Patch from Alexey Dobriyan:
Fix of deadlock in ptrace_attach().
References:
fix: commit f5b40e363ad6041a96e3da32281d8faa191597b9<br/>
Fix ptrace_attach()/ptrace_traceme()/de_thread() race
fix in fix:
commit f358166a9405e4f1d8e50d8f415c26d95505b6de<br/>
ptrace_attach: fix possible deadlock schenario with irqs<br/>
ptrace_traceme cleanup:<br/>
commit 6b9c7ed84837753a436415097063232422e29a35<br/>
[PATCH] use ptrace_get_task_struct in various places
write_can_lock() part was dropped since it doesn't exist in 2.6.9.
it was replaced with schedule().
PTRACE_TRACEME chunk was projected by hand on i386, x86_64, sparc,
sparc64, ppc, ppc64 due to ptrace_traceme() cleanup done after 2.6.9.
If more archs need coverage, yell on me.
Bug #72235.<br/>
Bug #61233.
</div>
==== diff-ms-microcode-size-20061124 ====
<div class="change">
Patch from mainstream, ported by Kostja(khorenko@):
fix removes the microcode's size check on x86.
Should be applied to 2.6.9-x series, 2.6.18.
Bug #72356.
[http://linux.bkbits.net:8080/linux-2.6/gnupatch@451a9864jKk9Jk5CliEBE9pB86swnw http://linux.bkbits.net:8080/linux-2.6/gnupatch@451a9864jKk9Jk5CliEBE9pB86swnw]
(which applies after clean-up patch
[http://linux.bkbits.net:8080/linux-2.6/gnupatch@451a9861MOurp8RhosNAAOMcAIh9bw) http://linux.bkbits.net:8080/linux-2.6/gnupatch@451a9861MOurp8RhosNAAOMcAIh9bw)]
<pre>
# ChangeSet
# 2006/09/27 08:26:18-07:00 shaohua.li@intel.com
# [PATCH] x86 microcode: don't check the size
#
# IA32 manual says if micorcode update's size is 0, then the size is
# default size (2048 bytes). But this doesn't suggest all microcode
# update's size should be above 2048 bytes to me. We actually had a
# microcode update whose size is 1024 bytes. The patch just removed the
# check.
#
# Signed-off-by: Shaohua Li <shaohua.li@intel.com>
# Cc: Tigran Aivazian <tigran@veritas.com>
# Signed-off-by: Andrew Morton <akpm@osdl.org>
# Signed-off-by: Linus Torvalds <torvalds@osdl.org>
</pre>
</div>
==== diff-ubc-dcacheleak-20061129 ====
<div class="change">
Patch from Denis (den@) based on idea from Pavel:
This patch fixes dcache leak on race from dcache_charge[_forced] and
dcache_unchange.
The idea: do not trust dentry_bc after count state change.
Bug #72051.
</div>
==== diff-ve-oom-newgeneration-20061128 ====
<div class="change">
Patch from Denis (den@):
This patch calculates OOM generations directly.
The counter is increased when MM of process killed by OOM is
finally destroyed.
Bug #71980.
</div>
==== diff-vzdq-dbg-detached-b-20061113 ====
<div class="change">
Patch from Alexey Dobriyan:
At one place qlnk origin is not set which may hide useful info later.
</div>
==== diff-ve-tty-clone-charges-20061127 ====
<div class="change">
Patch from Alexandr Andreev:
* account tty structures to kmemsize<br/>
* setup driver->refcount correctly. Doesn't affect anything.
</div>
==== diff-ms-nf-ipt-checks-20061124 ====
<div class="change">
Patch from Dmitry (dim@):
Issue found by Patrick McHardy. After checks reordering target and matches
checks that they could be used for this hook returns true always due to not
initialized e->comefrom field. So, order restored, necessary checks moved
in mark_source_chains(). For compats this issue exists from the beginning.
</div>
==== diff-cpt-veenter-vpid ====
<div class="change">
Patch from Alexey (alexey@):
[PATCH] VE_ENTER switches to virtual pid
When PID of the process is used by another processes as their PGID/SID,
we cannot do this. Otherwise, we can safely switch to virtual pid.
Difference of previous version is in one line: do_env_enter() can be
done when the process already has a virtual pid. (This sounds crazy, but
this is what happens with checkpointing. :-)).
</div>
==== diff-fairsched-cpumhz-20061122 ====
<div class="change">
Patch from Alexey Dobriyan:
ve_scale_khz() ignores the number of virtual cpus in the node leading to
strange results in /proc/cpuinfo:
<pre>
0.5 * 4 * 1000MHz
------------------- => 500MHz (but it should be 666 MHz)
3
</pre>
Also, initialize ->vcpus of fairsched init node to something sensible to
avoid division by zero. ->vcpus was not explicitly initialized at
startup.
Bug #71984.
</div>
==== diff-ve-vcpu-stats-20061206 ====
<div class="change">
Patch from Alexey Dobriyan:
VE has simple idle time collection logic (per VCPU ->strt_idle_time, ->idle_time).
For ->idle_time incrementing ->strt_idle_time must not be 0. This happens when
the very first task is scheduled on VCPU. Before that all VCPU statistics is
zeroed out because of
<code>
ve = kzalloc(sizeof(struct ve_struct));
</code>
including ->strt_idle_time.
All this leads to suprising /proc/stat and, as a consequence, top(1) output:
<pre>
# vzctl exec 140 cat /proc/stat
cpu 83 0 150 65654 173 0 0
cpu0 66 0 98 64839 167 0 0
cpu1 15 0 47 369 6 0 0
cpu2 0 0 4 446 0 0 0
cpu3 0 0 0 0 0 0 0 <===
cpu4 0 0 0 0 0 0 0 <===
cpu5 0 0 0 0 0 0 0 <===
cpu6 0 0 0 0 0 0 0 <===
cpu7 0 0 0 0 0 0 0 <===
</pre>
When user, system and nice times are 0%, it's OK. But when idle time is
_also_ 0%, it's surprising.
The solution is to start idle_time collecting state machine when VCPU is
added.
As a nice side offect, when you start VE with 2 VCPUs, later add 3-rd, it's
idle time will start ticking from the moment of addition.
{{bug|366}}.
</div>
==== diff-vzdq-detached-inode-20061201 ====
<div class="change">
Patch from Alexey Dobriyan, modified by Kirill:
When file is opened and unlinked before vzquotaon, but
generic_delete_inode() is called after vzquotaon, scary message appears:
<pre>
VZDQ: detached inode not in creation, orig 5, dev dm-0, inode 73828039, fs ext3
current 18761 (httpd), VE 102, time 77463.605075
[<ed79c296>] vzquota_det_qmblk_recalc+0x256/0x270 [vzdquota]
[<ed79c302>] vzquota_inode_qmblk_recalc+0x52/0x70 [vzdquota]
[<ed79c573>] vzquota_inode_data+0xb3/0xf0 [vzdquota]
[<ed79c449>] vzquota_inode_init_call+0x19/0x80 [vzdquota]
[<021fb940>] ext3_delete_inode+0x0/0x120
[<ed79e47f>] vzquota_initialize+0xf/0x20 [vzdquota]
[<0219d983>] generic_delete_inode+0x173/0x190
[<021996f6>] dput_recursive+0x56/0x230
[<0217f333>] __fput+0x123/0x1b0
[<0217d4c2>] filp_close+0x52/0xa0
[<0217d57a>] sys_close+0x6a/0xa0
</pre>
However, there is no need to scary admin:<br/>
1) inode is in I_FREEING state.<br/>
2) vzquota never saw inode before, thus it doesn't know what to do with it.<br/>
3) inode was unlinked outside of vzquota area of interest, otherwise
-EBUSY would have returned on quotaon.
So, do nothing and let inode silently die.
Many thanks to Alexey Kuznetsov for spelling out reliable testcase.
{{bug|116}}<br/>
{{bug|177}}<br/>
{{bug|341}}<br/>
Bug #60532.<br/>
Bug #61431.<br/>
Bug #55275.
</div>
==== diff-ve-sockstat-20061201 ====
<div class="change">
Patch from Denis:
Hide the content of /proc/net/sockstat rather than the file itself to
keep sysstat from crashing.
Bug #72587.
</div>
==== diff-ubc-dcachestop-20061204 ====
<div class="change">
Patch from Denis:<br/>
This patch fixed dcache accouting turning off. ub_dentry_walk does not
guarantee the order it meets dentry leafs and nodes. So, just set
d_inuse to -1 and uncharge all at once.
Bug #72730.
</div>
==== diff-ms-getport-20061205 ====
<div class="change">
Patch from Denis:
tcp_v4_get_port fixed. Treat local_port_range[0] > local_port_range[1]
as local_port_range[1] == local_port_range[0].
Bug #72736.
</div>
==== diff-ext3-pgfault11c ====
<div class="change">
Patch from Dmitry Monakhov:
ext3_journal_stop() should be called after ext3_prepare_failure()
unconditionally. i.e. always stop.
</div>
* CPT fixes
* compat iptables fixes
* dead state tasks leak fixed
* Fixes for broken teamspeak application
* ext3 fixes
* Microcode update fix
* UBC dcache fix
* OOM killer fix
* Fixes for compilation with gcc4
* Mainstream bridges security fix
* /proc/cpuinfo Mhz output fix.
* Security fix for local port range.
* Old vzdq detached inode warning fix.
* Other fixes.
=== Configs ===
* +<code>CONFIG_EXT2_FS_XATTR=y</code>
* +<code>CONFIG_EXT2_FS_POSIX_ACL=y</code>
* +<code>CONFIG_EXT3_FS_POSIX_ACL=y</code>
* +<code>CONFIG_REISERFS_FS_XATTR=y</code>
* +<code>CONFIG_REISERFS_FS_POSIX_ACL=y</code>
* +<code>CONFIG_FS_POSIX_ACL=y</code>
* +<code>CONFIG_NFS_V3_ACL=y</code>
* +<code>CONFIG_NFSD_V2_ACL=y</code>
* +<code>CONFIG_NFSD_V3_ACL=y</code>
* +<code>CONFIG_KOBJECT_UEVENT=y</code>
=== Compatibility ===
* In-kernel sysfs/uevent layer is now updated to be compatible with FC5 and SLES10 userland.
<includeonly>[[{{PAGENAME}}/changes#Patches|{{Long changelog message}}]]</includeonly><noinclude>
=== Patches ===
==== diff-vzdq-dbg-detached-20061113 ====
<div class="change">
Patch from Alexey Dobriyan:
Debug patch for resolving OVZ bugs {{b|341}}, {{b|116}}, {{b|177}}.
* Extend ->origin into array for previous origin tracking.
* Print last two origins in debugging message.
* Also print i_mode, i_op, i_fop, ... of the offending inode.
</div>
==== diff-ms-sched-migrate-ints ====
<div class="change">
Patch from Kirill:
move_task_off_dead_cpu() requires interrupts to be disabled,
while migrate_dead() calls it with enabled interrupts.
Added appropriate comments to functions and added
BUG_ON(!irqs_disabled()) into double_rq_lock() and
double_lock_balance() which are the real source of such bugs.
Signed-Off-By: Kirill Korotaev <dev@sw.ru>
</div>
==== diff-ms-oomhang-20061113 ====
<div class="change">
Patch from Denis:<br/>
This patch fixes the following problem:
if a process is being killed inside __alloc_pages by OOM, it can't
exit till it frees some space, which is impossible for now. The patch
allows to dig into reserves for such a process.
Bugs #71604, #71179.
</div>
==== diff-ubc-oomkill-b-20061113 ====
<div class="change">
Patch from Alexey:
gcc complains about scanned being used uninitialized and it's right.
</div>
==== diff-ubc-vmrss-fix-20060921 ====
<div class="change">
Patch from Pavel:
Fix negative vm_rss accounting.
Bug #71680.
</div>
==== diff-ve-vecalls-cleanup-20061108 ====
<div class="change">
Patch from Andrey:
Some file system used inside VE do not have mount point (nfs, fuse).
We do not need to perform checks and umount for them.
Existing check is relaxed and umount is performed conditionally.
</div>
==== diff-ve-netdev-move-20061107 ====
<div class="change">
Patch from Andrey:
When we moving network device from VE0 we must check that VE doesn't have
device with the same name (e.g. we can create veth device with name eth0
inside VE and try to move eth0 device from VE0 to this VE).
</div>
==== diff-ms-kobject-uevents-20061027 ====
<div class="change">
Patch from Roman Chechnev:
Implementation of userspace events
Summary patches and changes:
* export of SEQNUM to userspace (creates /sys/kernel)
* kobject: adjust hotplug_seqnum increment to keep userspace and kernel agreeing.
* kobject: fix build error if CONFIG_HOTPLUG is not enabled.
* kobject: hotplug_seqnum is not 64 bits on all platforms, so fix it.
* ksyms: don't implement /sys/kernel/hotplug_seqnum if CONFIG_HOTPLUG is not enabled.
* Implemetation of userspace events through a netlink socket
* kobject_uevent warning fix
* kobject_uevent: fix init ordering
* kevent: standardize on the event types
* kobject: add CONFIG_DEBUG_KOBJECT
* kevent: add block mount and umount support
* kobject: add add_hotplug_env_var()
* kevent: add __bitwise kobject_action to help the compiler check for misusages
* Make kobject_hotplug() work even if the kobject's kset doesn't implement any
* kobject_uevent warning fix
* hotplug: prevent skips in sequence number from happening
* kobject: fix hotplug bug with seqnum
* take me home, hotplug_path[]
* Move hotplug_path[] out of kmod.[ch] to kobject_uevent.[ch]
* kevent: fix build error if CONFIG_KOBJECT_UEVENT is not selected.
* USB: use add_hotplug_env_var() in core/usb.c
* Use add_hotplug_env_var() in firmware loader
* fix unnecessary increment in firmware_class_hotplug() and USB core
* drivers/usb/core/usb.c: add MODALIAS env var to hotplug
* usb: class driver pass dev_t to the class core
* PCI: add MODALIAS to hotplug event for pci devices
* PCI: Remove newline from pci MODALIAS variable
* PCI: move pci core to use add_hotplug_env_var()
* add the physical device and the bus to the hotplug environment
* add the driver name to the hotplug environment
* driver core: allow struct bin_attributes in class devices
* class_simple: pass dev_t to the class core
* class core: export MAJOR/MINOR to the hotplug env
* block: genhd: terminate, set to next free slot, shrink available space
* avoid problems with kobject_set_name and name with %
* Hotplug: Make dev->bus checking consistent
* Driver core: add driver symlink to device
* add the bus name to the hotplug environment
* Driver core: add "bus" symlink to class/block devices
* add sysfs attr to re-emit device hotplug event
</div>
==== diff-ubc-precharge-irqs ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:
[UBC] Fix UB_NUMFILE accounting optimisation leak
In 2.6.16 files are put via RCU, so ub_file_uncharge() is called
in IRQ context. Thus non-atomic decrement of file_precharged must
be done with IRQs disabled.
</div>
==== diff-ubc-precharged-kmemsize ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:
[UBC] Don't allow precharged files exhaust kmemsize
When file is put it may be added to precharged value to some
task thus holding UB_NUMFULE and UB_KMEMSIZE resources.
The problem is that files do not start uncharging till
ub_barrier_farnr() is hit for UB_NUMFILE. For ub0
ub_barrier_farnr() can happen only after hitting kmemsize
barrier. Thus kmemsize reurce gets completely exhausted.
On 2.6.9 this problem is not easyli reproducible as files
are put in the context of closing task usually. On 2.6.16
files are put via RCU and thus - in other task's context.
[http://forum.openvz.org/index.php?t=msg&th=1243&#msg_6933 http://forum.openvz.org/index.php?t=msg&th=1243&#msg_6933]
</div>
==== diff-ubc-ub0-precharge-init ====
<div class="change">
Patch from Pavel Emelianov <xemul@openvz.org>:
[UBC] Set correct precharge values for init_task.
Otherwise file freeing will happen in "swapper" context
and will spoil all statistics due to "negative" unsigned
long value.
{{bug|322}}.
</div>
==== diff-mainstream-elfzerobss-20050908 ====
<div class="change">
Patch from mainstream:
[PATCH] binfmt_elf: clearing bss may fail
So we discover that Borland's Kylix application builder emits weird elf
files which describe a non-writeable bss segment.
So remove the clear_user() check at the place where we zero out the bss. I
don't _think_ there are any security implications here (plus we've never
checked that clear_user() return value, so whoops if it is a problem).
Signed-off-by: Pavel Machek <pavel@suse.cz><br/>
Signed-off-by: Andrew Morton <akpm@osdl.org><br/>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
{{bug|332}}.
</div>
==== diff-cpt-rst-deleted-owner-b-20061103 ====
<div class="change">
Patch from Andrey:
Wrong structure with inode attributes were used in fixup_file_content().
We must be sure that we won't broke file system, thus we do not set inode mode
attributes from S_IFMT mask.
Fix for bugs:<br/>
Bug #71135.<br/>
Bug #71161.
</div>
==== diff-cpt-rst-deleted-owner-b-20061102 ====
<div class="change">
Patch from Andrey:
Due to silly mistake wrong inode mode were set on restore (cpt_mode were used
instead of cpt_i_mode).
This patch should be used instead of diff-cpt-rst-deleted-owner-b-20061030.
Fix for bugs:<br/>
Bug #71135.<br/>
Bug #71161.
</div>
==== diff-ve-proc-entry-threads-20061025 ====
<div class="change">
Patch from Kostja:
don't create /proc/<PID> dentry/inode for threads that are not group leaders.
Sets /proc/<TGID>/task/<PID> dentry as a task->proc_dentry for such threads.
Bug #69536.
This should eliminate the problem of growing number of processes in X state.
</div>
==== diff-ubc-debug-atomic-20061020 ====
<div class="change">
Patch from Pavel:
Optimized kmemsize accounting calls ub_slab_(ub)charge
with IRQs disabled, but debugging code isn't aware of it...
Bug #70694.
</div>
==== diff-ve-veth-del-fix-b-20061024 ====
<div class="change">
Patch from Andrey:
Context should be set from device (dev->owner_env) even for devices from VE0.
</div>
==== diff-cpt-external-file ====
<div class="change">
Patch from Alexey:
[CPT] fail when VE refers to an invisible file
Checkpointing used to ignore EINVAL returned by d_path().
It was workaround for tmpfs shmem files, which use detached mounts.
But this means that real invisible paths are detected too late:
checkpointing succeds and restore fails, which is not acceptable.
When d_path() fails, check that it is shmem file.
If it is not, fail immediately.
</div>
==== diff-ms-nf-compat-copy-err-20061030 ====
<div class="change">
Patch from Dmitry:
Fixed matches modules refcount in case of error in compat_copy_from_user()
</div>
==== diff-ms-lost-routes ====
<div class="change">
Patch from mainstream:
[IPV4]: Fix lost routes in fn_hash netlink dumps.
Spotted by itkes@fat.imed.msu.ru, the fn_hash_dump_bucket() main
loop does not increment 'i' properly, and thus routes will not
be listed, when the test 'i < s_i' passes.
The bug was added when the code was converted over to
hlist_for_each_entry() by your's truly.
Signed-off-by: David S. Miller <davem@davemloft.net>
</div>
==== diff-*-gcc4-compile* ====
diff-ide-taskfile-gcc4-20061106<br/>
diff-i2c-gcc4-compilefix-20051103<br/>
diff-acenic-gcc4-compile-20061107<br/>
diff-adp94xx-gcc4-compile-20061107<br/>
diff-adp94xx-gcc4-compile2-20061107<br/>
diff-adp94xx-gcc4-compile3-20061107<br/>
diff-ia64-sn-gcc4-compile-20061108:<br/>
<div class="change">
Patches from Kir Kolyshkin <kir@openvz.org>:
Various gcc4-related compilation fixes.
</div>
==== diff-cpt-external-pgids ====
<div class="change">
Patch from Alexey:
[CPT] do not checkpoint/restore global process groups
The patch is three-fold:
1. Do not try to allocate process groups/sessions, unless they
are not virtual. This is fix for bug #71825.
However, it is too late to detect failure.
2. Do not checkpoint VE, if it contains references to extenal process
groups/session ids. It is _destructive_ part. It definitely will
prevent migration of some commonly used configurations, when
some deficient daemon (sort of qmail) forgets to daemonize itself
and it is started by vzctl exec.
Workaround is possible in theory at level of vzctl, if it makes
the second fork and setsid() after VE_ENTER. It is not impossible,
because entered process is not required to be child of vzctl, actual
reaping and waiting is done not by wait4(), but with control pipe.
Another way is to use clone(CLONE_PARENT), but it is also tricky.
3. Do the same checks before migration started to prevent failure due to #2 after rsync phase.
</div>
==== diff-sysctl-nums-20061121 ====
<div class="change">
Patch from Alexandr Andreev:
fix misprint (error) in sysctl numbers.
</div>
==== diff-cpt-conntrack-version-2 ====
<div class="change">
Patch from Alexey:
[CPT] checkpoint/restore conntrack on 2.6.9
Comparing to 2.6.8, 2.6.9 tracks much more information about
TCP connections. We need to reserve additional space for it.
Right now it still does not make much of sense to add additional
attribute just because it is already known that in 2.6.16..19
conntrack is even more different.
So, we just change format of all the record and try to fill missing
field with some reasonable values, when migrating from 2.6.8.
</div>
==== diff-security-bridges-fdb-entries ====
<div class="change">
Patch from mainstream:
bridge: fix possible overflow in get_fdb_entries (CVE-2006-5751)
Make sure to properly clamp maxnum to avoid overflow (CVE-2006-5751).
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
</div>
==== diff-ext3-retries4 ====
<div class="change">
Patch from Andrey with fixes from Dmitry Monakhov:
in journal=ordered or journal=data mode retry in ext3_prepare_write()
breaks the requirements of journaling of data with respect to metadata.
The fix is to call commit_write to commit allocated zero blocks before
retry.
Author: Andrey Savochkin <saw@sw.ru><br/>
Signed-Off-By: Kirill Korotaev <dev@openvz.org>
</div>
==== diff-ext3-pgfault11b ====
<div class="change">
Patch from Dmitry (dmonakhov@), modified by Kirill:<br/>
This patch fixes issues introduced by diff-ext3-pgfault11 patch<br/>
* remove unused variables<br/>
* fix incorrect recursion detection
Bug #71881.
</div>
==== diff-sysrq-debugging-c-20061129 ====
<div class="change">
Patch from Alexandr Andreev:
SysRq debugger memory dumping enhancements:<br/>
* 32/64 bit architectures support (80 chars per line);<br/>
* skipping lines with zero bytes;
</div>
==== diff-security-ptrace-race-20061129 ====
<div class="change">
Patch from Alexey Dobriyan:
Fix of deadlock in ptrace_attach().
References:
fix: commit f5b40e363ad6041a96e3da32281d8faa191597b9<br/>
Fix ptrace_attach()/ptrace_traceme()/de_thread() race
fix in fix:
commit f358166a9405e4f1d8e50d8f415c26d95505b6de<br/>
ptrace_attach: fix possible deadlock schenario with irqs<br/>
ptrace_traceme cleanup:<br/>
commit 6b9c7ed84837753a436415097063232422e29a35<br/>
[PATCH] use ptrace_get_task_struct in various places
write_can_lock() part was dropped since it doesn't exist in 2.6.9.
it was replaced with schedule().
PTRACE_TRACEME chunk was projected by hand on i386, x86_64, sparc,
sparc64, ppc, ppc64 due to ptrace_traceme() cleanup done after 2.6.9.
If more archs need coverage, yell on me.
Bug #72235.<br/>
Bug #61233.
</div>
==== diff-ms-microcode-size-20061124 ====
<div class="change">
Patch from mainstream, ported by Kostja(khorenko@):
fix removes the microcode's size check on x86.
Should be applied to 2.6.9-x series, 2.6.18.
Bug #72356.
[http://linux.bkbits.net:8080/linux-2.6/gnupatch@451a9864jKk9Jk5CliEBE9pB86swnw http://linux.bkbits.net:8080/linux-2.6/gnupatch@451a9864jKk9Jk5CliEBE9pB86swnw]
(which applies after clean-up patch
[http://linux.bkbits.net:8080/linux-2.6/gnupatch@451a9861MOurp8RhosNAAOMcAIh9bw) http://linux.bkbits.net:8080/linux-2.6/gnupatch@451a9861MOurp8RhosNAAOMcAIh9bw)]
<pre>
# ChangeSet
# 2006/09/27 08:26:18-07:00 shaohua.li@intel.com
# [PATCH] x86 microcode: don't check the size
#
# IA32 manual says if micorcode update's size is 0, then the size is
# default size (2048 bytes). But this doesn't suggest all microcode
# update's size should be above 2048 bytes to me. We actually had a
# microcode update whose size is 1024 bytes. The patch just removed the
# check.
#
# Signed-off-by: Shaohua Li <shaohua.li@intel.com>
# Cc: Tigran Aivazian <tigran@veritas.com>
# Signed-off-by: Andrew Morton <akpm@osdl.org>
# Signed-off-by: Linus Torvalds <torvalds@osdl.org>
</pre>
</div>
==== diff-ubc-dcacheleak-20061129 ====
<div class="change">
Patch from Denis (den@) based on idea from Pavel:
This patch fixes dcache leak on race from dcache_charge[_forced] and
dcache_unchange.
The idea: do not trust dentry_bc after count state change.
Bug #72051.
</div>
==== diff-ve-oom-newgeneration-20061128 ====
<div class="change">
Patch from Denis (den@):
This patch calculates OOM generations directly.
The counter is increased when MM of process killed by OOM is
finally destroyed.
Bug #71980.
</div>
==== diff-vzdq-dbg-detached-b-20061113 ====
<div class="change">
Patch from Alexey Dobriyan:
At one place qlnk origin is not set which may hide useful info later.
</div>
==== diff-ve-tty-clone-charges-20061127 ====
<div class="change">
Patch from Alexandr Andreev:
* account tty structures to kmemsize<br/>
* setup driver->refcount correctly. Doesn't affect anything.
</div>
==== diff-ms-nf-ipt-checks-20061124 ====
<div class="change">
Patch from Dmitry (dim@):
Issue found by Patrick McHardy. After checks reordering target and matches
checks that they could be used for this hook returns true always due to not
initialized e->comefrom field. So, order restored, necessary checks moved
in mark_source_chains(). For compats this issue exists from the beginning.
</div>
==== diff-cpt-veenter-vpid ====
<div class="change">
Patch from Alexey (alexey@):
[PATCH] VE_ENTER switches to virtual pid
When PID of the process is used by another processes as their PGID/SID,
we cannot do this. Otherwise, we can safely switch to virtual pid.
Difference of previous version is in one line: do_env_enter() can be
done when the process already has a virtual pid. (This sounds crazy, but
this is what happens with checkpointing. :-)).
</div>
==== diff-fairsched-cpumhz-20061122 ====
<div class="change">
Patch from Alexey Dobriyan:
ve_scale_khz() ignores the number of virtual cpus in the node leading to
strange results in /proc/cpuinfo:
<pre>
0.5 * 4 * 1000MHz
------------------- => 500MHz (but it should be 666 MHz)
3
</pre>
Also, initialize ->vcpus of fairsched init node to something sensible to
avoid division by zero. ->vcpus was not explicitly initialized at
startup.
Bug #71984.
</div>
==== diff-ve-vcpu-stats-20061206 ====
<div class="change">
Patch from Alexey Dobriyan:
VE has simple idle time collection logic (per VCPU ->strt_idle_time, ->idle_time).
For ->idle_time incrementing ->strt_idle_time must not be 0. This happens when
the very first task is scheduled on VCPU. Before that all VCPU statistics is
zeroed out because of
<code>
ve = kzalloc(sizeof(struct ve_struct));
</code>
including ->strt_idle_time.
All this leads to suprising /proc/stat and, as a consequence, top(1) output:
<pre>
# vzctl exec 140 cat /proc/stat
cpu 83 0 150 65654 173 0 0
cpu0 66 0 98 64839 167 0 0
cpu1 15 0 47 369 6 0 0
cpu2 0 0 4 446 0 0 0
cpu3 0 0 0 0 0 0 0 <===
cpu4 0 0 0 0 0 0 0 <===
cpu5 0 0 0 0 0 0 0 <===
cpu6 0 0 0 0 0 0 0 <===
cpu7 0 0 0 0 0 0 0 <===
</pre>
When user, system and nice times are 0%, it's OK. But when idle time is
_also_ 0%, it's surprising.
The solution is to start idle_time collecting state machine when VCPU is
added.
As a nice side offect, when you start VE with 2 VCPUs, later add 3-rd, it's
idle time will start ticking from the moment of addition.
{{bug|366}}.
</div>
==== diff-vzdq-detached-inode-20061201 ====
<div class="change">
Patch from Alexey Dobriyan, modified by Kirill:
When file is opened and unlinked before vzquotaon, but
generic_delete_inode() is called after vzquotaon, scary message appears:
<pre>
VZDQ: detached inode not in creation, orig 5, dev dm-0, inode 73828039, fs ext3
current 18761 (httpd), VE 102, time 77463.605075
[<ed79c296>] vzquota_det_qmblk_recalc+0x256/0x270 [vzdquota]
[<ed79c302>] vzquota_inode_qmblk_recalc+0x52/0x70 [vzdquota]
[<ed79c573>] vzquota_inode_data+0xb3/0xf0 [vzdquota]
[<ed79c449>] vzquota_inode_init_call+0x19/0x80 [vzdquota]
[<021fb940>] ext3_delete_inode+0x0/0x120
[<ed79e47f>] vzquota_initialize+0xf/0x20 [vzdquota]
[<0219d983>] generic_delete_inode+0x173/0x190
[<021996f6>] dput_recursive+0x56/0x230
[<0217f333>] __fput+0x123/0x1b0
[<0217d4c2>] filp_close+0x52/0xa0
[<0217d57a>] sys_close+0x6a/0xa0
</pre>
However, there is no need to scary admin:<br/>
1) inode is in I_FREEING state.<br/>
2) vzquota never saw inode before, thus it doesn't know what to do with it.<br/>
3) inode was unlinked outside of vzquota area of interest, otherwise
-EBUSY would have returned on quotaon.
So, do nothing and let inode silently die.
Many thanks to Alexey Kuznetsov for spelling out reliable testcase.
{{bug|116}}<br/>
{{bug|177}}<br/>
{{bug|341}}<br/>
Bug #60532.<br/>
Bug #61431.<br/>
Bug #55275.
</div>
==== diff-ve-sockstat-20061201 ====
<div class="change">
Patch from Denis:
Hide the content of /proc/net/sockstat rather than the file itself to
keep sysstat from crashing.
Bug #72587.
</div>
==== diff-ubc-dcachestop-20061204 ====
<div class="change">
Patch from Denis:<br/>
This patch fixed dcache accouting turning off. ub_dentry_walk does not
guarantee the order it meets dentry leafs and nodes. So, just set
d_inuse to -1 and uncharge all at once.
Bug #72730.
</div>
==== diff-ms-getport-20061205 ====
<div class="change">
Patch from Denis:
tcp_v4_get_port fixed. Treat local_port_range[0] > local_port_range[1]
as local_port_range[1] == local_port_range[0].
Bug #72736.
</div>
==== diff-ext3-pgfault11c ====
<div class="change">
Patch from Dmitry Monakhov:
ext3_journal_stop() should be called after ext3_prepare_failure()
unconditionally. i.e. always stop.
</div>