Changes

Jump to: navigation, search

Download/kernel/rhel4/023stab043.2/changes

16,105 bytes added, 14:45, 20 March 2008
created
== Changes ==
* Rebased on RH 42.0.8-EL
* Compat fixes for x86_64 (inode numbers and sys_stime)
* CPT UBC preserving is fixed and during restore limits are increased now
* UBC othersockbuf over-optimization fix
* Updated 3ware driver up to 2.26.05.006 version
* NFS in VE fix
* No more panic on oops
* sys_waitid virtualization fixes
* Lots of other small fixes and debug patches

=== Config changes ===
Same as {{Kernel link|rhel4|023stab040.1}}, plus:

* +<code>CONFIG_SCSI_QLA4XXX_FAILOVER=y</code>
* +<code>CONFIG_NETPOLL_TRAP=y</code>
* +<code>CONFIG_NETDUMP=m</code>
* +<code>CONFIG_EXPORTFS=m</code>
* +<code>CONFIG_NFS_ACL_SUPPORT=m</code>
<dl>
==== diff-ubc-nootheropt-20070206 ====
<div class="change">
Patch from Andrey (saw@), modified by Evgeniy:<br/>
Fix for over-optimization of OTHERSOCKBUF accounting.

For those sockets there is no protection by socket sock.

Bug was provoked by optimization of charging/uncharging othersockbufs:
diff-ubc-tcpsndopt-20060429

In brief idea is the following: optimization is based on assumption that soket
is always locked by lock_sock and protected from using the socket by more
than one users simultaneously. But current assumption is wrong for datagram
sockets (for example PF_UNIX ones), that are not locked in the majority of
cases. This provokes race condition between 2 users of ths same dgram socket.
As for tcp sockets - they are always locked (or it can be done so), - this
prevents races.

Bug #70974.<br/>
Bug #74089.
</div>

==== diff-cpt-pagein-swapoff-fix ====
<div class="change">
Patch from Andrey:<br/>
Some error can occur during rst_swapoff() and sys_swapoff().

In case of -EINVAL we do not need to perform cleanup. In all other cases we
should do it.

Move cleanup in separate function and perform it in loop unless success or
-EINVAL. Clear TIF_SIGPENDING flag in case of pending signals to make sure
that sys_swapoff() won't be interrupted, restore this flag on exit if it was
cleared.

Bug #74725.
</div>

==== diff-cpt-ubc-adjust-on-restore ====
<div class="change">
Patch from Andrey:

While restore process we can exceed UBC limits, because during restore process
more resources are used.

Bug #71159.
</div>

==== diff-cpt-ubc-change-image-format-b ====
<div class="change">
Patch from Andrey:<br/>
Change ubc image format to remove magic numbers like 6 and 12.
</div>

==== diff-cpt-ubc-change-image-format ====

<div class="change">
Patch from Andrey:

Change order of ubc parameters in image file. Now we are storing resource
pairs (ub_parms and ub_store) as one unit:<br/>
KMEMSIZE parms, KMEMSIZE store, LOCKEDPAGES parms, LOCKEDPAGES store, ...

Previous format was:<br/>
KMEMSIZE parms, LOCKEDPAGES parms, ..., KMEMSIZE store, LOCKEDPAGES store, ...

With new format it is simpler to increase number of ubc resources.
</div>

==== diff-ms-fs-preparewrite-eh-20070202 ====
<div class="change">
Patch from Kirill:

The original patch which was used in OVZ/VZ was
diff-ms-fs-preparewrite-eh-20061005.

It is a pity, but it was broken by RH when commited to RHEL4 update
(linux-2.6.13-buffer.patch). __block_prepare_write() error handling
is done incorrectly, since IO initiated on some of the buffers
should be waited for to complete (wait_on_buffer).

Fix it with this incremental patch which makes VZ code the same
as it was for a long time already.
</div>

==== diff-cpt-pgin-alloc-index-fix ====
<div class="change">
Patch from Andrey:

1. Index of lazy page was checked incorrectly:
<pre>
- if (page_nr &gt; PAGE_SIZE/sizeof(struct pgin_desc*)) {
+ if (page_nr &gt;= PGINDIR_SIZE/sizeof(struct pagein_desc*)) {
</pre>
so we could try to access outside of array boundaries and oops.

Bug #74455.<br/>
Bug #75539.

2. Current lazy migration is limited to 512MB on x86-64.
Increase table size to be able to store up to 2097152 lazy pages (8 Gb).
</div>

==== diff-ms-kmap-pte0-20070207 ====
<div class="change">
Patch from Vasily:<br/>
fixes kmap PTE0 leakage: pte_unmap() missed on error path in install_page()

Bug #75560.
</div>

==== diff-arch-4gb-pgdctor ====
<div class="change">
Patch from Kirill:

During 4GB split port to 2.6.18 it was found that
2.6.9 kernel incorrectly inserts unitialized yet pgd
to pgd_list. it is wrong, initialize it first.
</div>

==== diff-cpt-iter-pfn-fix ====
<div class="change">
Patch from Andrey:

pfn index was checked incorrectly while lookup/alloc,
so that we could get out of the array boundaries and oops.

related to the same bugs with lazy migration:<br/>
Bug #74455.<br/>
Bug #75539.
</div>

==== diff-cpt-pte-unmap-lost-20070205 ====
<div class="change">
Patch from Pavel, found by Vasiliy:

When porting to new mm locking one unmap+unlock was lost.
Found due to (but not fixes):<br/>
Bug #75448.
</div>

==== diff-cpt-ubc-save-restore-fix ====
<div class="change">

Patch from Andrey:<br/>
UBC were saved and restored incorrectly:

<pre>
for (i = 0; i &lt; UB_RESOURCES; i++)
dump_one_bc_parm(v-&gt;cpt_parms, bc-&gt;ub_parms, 0);
</pre>

Only KMEMSIZE values were saved and restored in this case.

1. Do not restore UBC if we get image with previous version.

2. cpt_parms has space for 32x2 resources, however,
first UB_RESOURCES * 2 are used. i.e. not 24 of 32 and 24 of 32.
keep this for compatibility.
</div>

==== diff-cpt-check-image-version ====
<div class="change">
Patch from Andrey:<br/>
Add check for image version.

Allow to restore only images from 2.6.9 kernel.
Actually the following combinations are now allowed only:<br/>
2.6.8/2.6.9 &lt;-&gt; 2.6.9+plus patch<br/>
2.6.9 -&gt; 2.6.16+ (this patch disables this combination)

</div>

==== diff-ms-ext3-quota-drop ====
<div class="change">
patch from Dmitry (dmonakhov@):<br/>
Backported from mainstream v2.6.13<br/>
[PATCH] ext3: drop quota references before releasing inode

We must drop references to quota structures before releasing the inode.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;<br/>
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;<br/>
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;

commit ab6862e6dab813ecde9ae7da506188dc1e9f11bb
</div>

==== diff-ms-ext2-quota-drop ====
<div class="change">
patch from Dmitry (dmonakhov@):<br/>
Backported from mainstream v2.6.13<br/>
[PATCH] ext2: drop quota reference before releasing inode

We must drop references to quota structures before releasing the inode.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;<br/>
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;<br/>
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;

commit c7e9a52ef0089492bba457dfb8eba1a54e19f24a
</div>

==== diff-simfs-reiserfs-statfs-20070117 ====
<div class="change">
Patch from Evgeny:

1. when DISK_QUOTA is switched off in /etc/vz/vz.config,
sim_statfs takes kstatfs from underlying fs.
reiserfs do not initialize f_ffree (free inodes) and f_files in kstatfs.
So we need to zero out kstatfs structure before asking reiserfs.

2. reiserfs used to initialize f_ffree to -1 (in 2.4.x).
it was an exception among other filesystems that could be used
for determining that fs is reiserfs. In 2.6.x f_ffree is not
initialized by reiserfs at all.
So need to distinguish reiserfs another way. Use fsmagic.

{{bug|199}}.
</div>

==== diff-ms-security-pt-interp-20070124 ====
<div class="change">
Patch from Alexey Dobriyan:

Proposed patch to fix #5 in<br/>
[http://www.isec.pl/vulnerabilities/isec-0017-binfmt_elf.txt http://www.isec.pl/vulnerabilities/isec-0017-binfmt_elf.txt]<br/>
aka<br/>
[http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2004-1073 http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2004-1073]

To reproduce, do
<ul>
<li>grab poc at the end of advisory.</li>
<li>add line "eph.p_memsz = 4096;" after "eph.p_filesz = 4096;"<br/>
where first "4096" is something equal to or greater than 4096.</li>
<li>./poc /usr/bin/sudo &amp;&amp; ls -l</li>
</ul>

Here I get:

<pre>
-rw------- 1 ad ad 102400 2007-01-15 19:17 core
---s--x--x 2 root root 101820 2007-01-15 19:15 /usr/bin/sudo
</pre>

Check for MAY_READ as binfmt_misc.c does.
</div>

==== diff-emt64-vsyscall-b-20060718 ====
<div class="change">
Patch from Vasily:

fixed initialization of sysctl_vsyscall variable,
currently vsyscall_init always overwrites the value zeroed in time_init_gtod()

Bug #73353.
</div>

==== diff-cpt-sigsuspend-lockup ====
<div class="change">
Patch from Alexey:<br/>
[CPT] sigsuspend could hang forever after restore

Do not restart syscalls with TIF_RESTORE_SIGMASK in cpt.

It was severe bug. First, we do not need to restart such syscalls,
they are restarted by core on exit from syscall. Second, it was wrong
to restart syscall but do not clear TIF_RESTORE_SIGMASK and do not
restore mask. If some signal happens here, it will be delivered,
but syscall is restarted and sigsusend() will not exit hanging forever.

To knowledge base: restart without checking for TIF_RESTORE_SIGMASK
is not allowed.
</div>

==== diff-ms-compat-emt64-stime-20070131 ====
<div class="change">
Patch from Alexandr Andreev:<br/>
32bit compat_sys_stime is missing on x86_64

{{bug|438}}.
</div>

==== diff-ms-compat-stat32-ino-20061229 ====
<div class="change">
Patch from mainstream:<br/>
[PATCH 1/3] make static counters in new_inode and iunique be 32 bits

From: Jeff Layton &lt;jlayton@redhat.com&gt;<br/>

To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org

When a 32-bit program that was not compiled with large file offsets does a
stat and gets a st_ino value back that won't fit in the 32 bit field, glibc
(correctly) generates an EOVERFLOW error. We can't do anything about fs's
with larger permanent inode numbers, but when we generate them on the fly,
we ought to try and have them fit within a 32 bit field.

This patch takes the first step toward this by making the static counters in
these two functions be 32 bits.

Signed-off-by: Jeff Layton &lt;jlayton@redhat.com&gt;<br/>
Acked-By: Kirill Korotaev &lt;dev@openvz.org&gt;
</div>

==== diff-cfq-timeslice-20070129 ====
<div class="change">
Patch from Vasily Tarasov (vtaras@), improves CFQ IO scheduler:

"CFQ in 2.6.9 kernel creates a request queue for each process and
performs round robin procedure other these queues, selecting one request
from each queue. This patch adds time-slice for each per-process queue:
it means, that during timeslice only requests from the certain queue are
serviced. Such mechanism is used in CFQ 2.6.18."

Bug #71929.
</div>

==== linux-2.6.9-3w-9xxx-2.26.05.006.patch ====
<div class="change">
Patch from Vasiliy:<br/>
3ware driver update to 2.26.05.006
</div>

==== diff-ve-nfs-execenv-20070221 ====
<div class="change">
Patch from Evgeny:

svc_recvfrom (net/sunrpc/svcsock.c) function switches
the context to ve0 and never returns to ve context.
This may cause oops when VE private area is placed on nfs partition.

Bug #76354.
</div>

==== diff-ms-unlock-buffer-barrier ====
<div class="change">
Patch from mainstream:<br/>
[PATCH] buffer: memorder fix

unlock_buffer(), like unlock_page(), must not clear the lock without
ensuring that the critical section is closed.

Mingming later sent the same patch, saying:

We are running SDET benchmark and saw double free issue for ext3 extended
attributes block, which complains the same xattr block already being freed (in
ext3_xattr_release_block()). The problem could also been triggered by
multiple threads loop untar/rm a kernel tree.

The race is caused by missing a memory barrier at unlock_buffer() before the
lock bit being cleared, resulting in possible concurrent h_refcounter update.
That causes a reference counter leak, then later leads to the double free that
we have seen.

Inside unlock_buffer(), there is a memory barrier is placed *after* the lock
bit is being cleared, however, there is no memory barrier *before* the bit is
cleared. On some arch the h_refcount update instruction and the clear bit
instruction could be reordered, thus leave the critical section re-entered.

The race is like this: For example, if the h_refcount is initialized as 1,

<pre>
cpu 0: cpu1
-------------------------------------- -----------------------------------
lock_buffer() /* test_and_set_bit */
clear_buffer_locked(bh);
lock_buffer() /* test_and_set_bit */
h_refcount = h_refcount+1; /* = 2*/ h_refcount = h_refcount + 1; /*= 2 */
clear_buffer_locked(bh);
.... ......
</pre>

We lost a h_refcount here. We need a memory barrier before the buffer head lock
bit being cleared to force the order of the two writes. Please apply.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;<br/>
Signed-off-by: Mingming Cao &lt;cmm@us.ibm.com&gt;<br/>
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;<br/>
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;

GIT: 72ed3d035855841ad611ee48b20909e9619d4a79<br/>
[http://linux.bkbits.net:8080/linux-2.6/?PAGE=cset&amp;REV=1.5353.22.215 http://linux.bkbits.net:8080/linux-2.6/?PAGE=cset&amp;REV=1.5353.22.215]
</div>

==== diff-simfs-mntcount-20070208 ====
<div class="change">
Patch from Evgeny Kravtsunov &lt;emkravts@openvz.org&gt;:<br/>
[SIMFS] get lower vfsmount on simfs mount

This prevents lower FS from being umounted while simfs is mounted.

{{bug|451}}.<br/>
[http://git.openvz.org/?p=kernel-028;a=commit;h=63f1ecae912ee9614bcad23cc147ca5557f8b547 http://git.openvz.org/?p=kernel-028;a=commit;h=63f1ecae912ee9614bcad23cc147ca5557f8b547]

</div>
==== diff-ms-ext3-unlink-race ====
<div class="change">
Patch from mainstream:<br/>
[PATCH] return ENOENT from ext3_link when racing with unlink

Return -ENOENT from ext[34]_link if we've raced with unlink and i_nlink is
0. Doing otherwise has the potential to corrupt the orphan inode list,
because we'd wind up with an inode with a non-zero link count on the list,
and it will never get properly cleaned up &amp; removed from the orphan list
before it is freed.

[akpm@osdl.org: build fix]<br/>
Signed-off-by: Eric Sandeen &lt;sandeen@redhat.com&gt;<br/>
Cc: &lt;linux-ext4@vger.kernel.org&gt;<br/>
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;<br/>
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;

GIT: 2988a7740dc0dd9a0cb56576e8fe1d777dff0db3<br/>
[http://linux.bkbits.net:8080/linux-2.6/?PAGE=cset&amp;REV=1.5353.22.208 http://linux.bkbits.net:8080/linux-2.6/?PAGE=cset&amp;REV=1.5353.22.208]

Bug #74302.
</div>

==== diff-ubc-twcountlimit-20070213 ====
<div class="change">
Patch from Denis:<br/>
This patch changes default for per/UB TW buckets limitations

{{bug|460}}.
</div>

==== diff-dbg-pb-add-list-ref ====
<div class="change">
Patch from Kirill:<br/>
Print some debug info instead of BUG in pb_add_list_ref().

Actually there is a bug in copy_page_range logic:
if the page was reserved then it is never tied to UB with PBC.
Good. However, if the page is unreserved later, then next copy_page_range
will blindly assume that it should have been tied already(!).
And can be dissappointed by the fact it is not.

The bad thing is that packet_mmap() from net/packet/af_packet.c
maps exactly such pages...

but I don't see message:<br/>
printk(KERN_DEBUG "packet_mmap: vma is busy: %d\n", atomic_read(&amp;po-&gt;mapped));<br/>
Sigh...
</div>

==== diff-dbg-spinlock ====
<div class="change">
Patch from Kirill:<br/>
Debug for valid_swaphandles() oops from Strato.

Check for correct swp_entry and print spinlock magic when doing BUG()
</div>

==== diff-ve-wait-vpids-20070228 ====
<div class="change">
Patch from Alexey Kuznetsov:<br/>
Forgotten bits of pid virtualization in sys_wait*
</div>

==== diff-rh-panic-on-oops-20070228 ====
<div class="change">
Patch from Kirill:

RH has changed the default behaviour of the kernel:
now it panics on oops :/ return it back to continue
</div>

==== linux-2.6.9-arcmsr-1.20.0X.13-61107.patch ====
<div class="change">
patch ported by Kostja (khorenko@):<br/>
Areca driver v1.20.0X.13-61107 added.

Sources from Areca site:
[ftp://ftp.areca.com.tw/RaidCards/AP_Drivers/Linux/DRIVER/SourceCode/ ftp://ftp.areca.com.tw/RaidCards/AP_Drivers/Linux/DRIVER/SourceCode/]

Bug #59933.
</div>

Navigation menu