Open main menu

OpenVZ Virtuozzo Containers Wiki β


< Download‎ | kernel‎ | 2.6.8‎ | 022stab078.10
Revision as of 18:25, 22 October 2009 by Kir (talk | contribs) (Protected "Download/kernel/2.6.8/022stab078.10/changes": Robot: Protecting a list of files. [edit=autoconfirmed:move=autoconfirmed])
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)



  • UBC, VZDQ and other fixes
  • Performance optimizations
  • A fix for iowait stats
  • Mainstream security fixes
  • Added nosrc.rpm


Same as 022stab077.1, plus



  •  CONFIG_SERIAL_8250_NR_UARTS=16 (was 4)





Patch from Kirill:
This patch is addon for diff-vzdq-getstat-20060510 and fixes all other places where allocation with GFP_FS under qmblk->dq_sem is possible.


Patch from mainstream:
This patch fixes error handling in readv(). Trivial error check corrected a bit.
Bug #61525.


Patch from Kirill:
This patch fixes selfdeadlock in vzquota in quota_ugid_getstat(). copy_to_user() can trigger page fault and stuck on qmbl->dq_sem.
Bug #62179.


Patch from mainstream:
[PATCH] smbfs chroot issue (CVE-2006-1864)

Mark Moseley reported that a chroot environment on a SMB share can be left via "cd ..\\". Similar to CVE-2006-1863 issue with cifs, this fix is for smbfs.

Steven French <> wrote:

Looks fine to me. This should catch the slash on lookup or equivalent, which will be all obvious paths of interest.

Signed-off-by: Chris Wright <>



Patch from mainstream:
[PATCH] stale POSIX lock handling

I believe that there is a problem with the handling of POSIX locks, which the attached patch should address.

The problem appears to be a race between fcntl(2) and close(2). A multithreaded application could close a file descriptor at the same time as it is trying to acquire a lock using the same file descriptor. I would suggest that that multithreaded application is not providing the proper synchronization for itself, but the OS should still behave correctly.

SUS3 (Single UNIX Specification Version 3, read: POSIX) indicates that when a file descriptor is closed, that all POSIX locks on the file, owned by the process which closed the file descriptor, should be released.

The trick here is when those locks are released. The current code releases all locks which exist when close is processing, but any locks in progress are handled when the last reference to the open file is released.

There are three cases to consider.

One is the simple case, a multithreaded (mt) process has a file open and races to close it and acquire a lock on it. In this case, the close will release one reference to the open file and when the fcntl is done, it will release the other reference. For this situation, no locks should exist on the file when both the close and fcntl operations are done. The current system will handle this case because the last reference to the open file is being released.

The second case is when the mt process has dup(2)'d the file descriptor. The close will release one reference to the file and the fcntl, when done, will release another, but there will still be at least one more reference to the open file. One could argue that the existence of a lock on the file after the close has completed is okay, because it was acquired after the close operation and there is still a way for the application to release the lock on the file, using an existing file descriptor.

The third case is when the mt process has forked, after opening the file and either before or after becoming an mt process. In this case, each process would hold a reference to the open file. For each process, this degenerates to first case above. However, the lock continues to exist until both processes have released their references to the open file. This lock could block other lock requests.

The changes to release the lock when the last reference to the open file aren't quite right because they would allow the lock to exist as long as there was a reference to the open file. This is too long.

The new proposed solution is to add support in the fcntl code path to detect a race with close and then to release the lock which was just acquired when such as race is detected. This causes locks to be released in a timely fashion and for the system to conform to the POSIX semantic specification.

This was tested by instrumenting a kernel to detect the handling locks and then running a program which generates case #3 above. A dangling lock could be reliably generated. When the changes to detect the close/fcntl race were added, a dangling lock could no longer be generated.

Cc: Matthew Wilcox <>
Cc: Trond Myklebust <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>

GIT: c293621bbf678a3d85e3ed721c3921c8a670610d
RHEL4u3: linux-2.6.9-locks-after-close.patch


Patch from mainstream:
[PATCH] fs/locks.c: Fix sys_flock() race

sys_flock() currently has a race which can result in a double free in the multi-thread case.

Thread 1                        Thread 2

sys_flock(file, LOCK_EX)
                                sys_flock(file, LOCK_UN)

If Thread 2 removes the lock from inode->i_lock before Thread 1 tests for list_empty(&lock->fl_link) at the end of sys_flock, then both threads will end up calling locks_free_lock for the same lock.

Fix is to make flock_lock_file() do the same as posix_lock_file(), namely to make a copy of the request, so that the caller can always free the lock.

This also has the side-effect of fixing up a reference problem in the lockd handling of flock.

Signed-off-by: Trond Myklebust <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>

X-Git-Tag: v2.6.17-rc1


Patch from Vasiliy Tarasov:
This patch fixes vprintk(). It should print the messages in VE0, not in current context.


Patch from Dmitry:
fixed ipt_REDIRECT work inside VEs.

OpenVZ Bug #171.


Patch from Kirill:
This patch fixes wake_up_forked_process():

    • it should not change task->cpu
    • it should lock p's runqueue, not current
    • and current->runqueue can be != p->runqueue


Patch from Dmitry Mishin <>:
Fixed oops in get_tgid_list.
If external (ve0) process lookups proc tree of VE, which is in ve_cleanup_list, oops in get_tgid_list is possible. Fixed.


Patch from mainstream and Vasiliy Averin:

This patch contains a fix for 64-bit DMA capability check in megaraid_{mm,mbox} driver. With patch, the driver access PCI configuration space with dedicated offset to read a signature. If the signature read, it means that the controller has capability to handle 64-bit DMA. Before this patch, the driver blindly claimed the capability without checking with controller. The issue has been reported by Vasily Averin <>. Thank you Vasily for the reporting.

Fixed a bug in megaraid_init_mbox().

Customer reported "garbage in file on x86_64 platform".

Root Cause: the driver registered controllers as 64-bit DMA capable for those which are not support it.

Fix: Made change in the function inserting identification machanism identifying 64-bit DMA capable controllers.

Signed-Off By: Seokmann Ju <>


Patch from Pavel:
This patch fixes poor pb_hash function, which reduced hash list length very much and made fork()/exit() quicker.


Patch from Dmitry:
replace add_timer() by mod_timer() in dst_run_gc() in order to avoid BUG message.

         CPU1                            CPU2
dst_run_gc()  entered           dst_run_gc() entered
 spin_lock(&amp;dst_lock)                   .....
del_timer(&amp;dst_gc_timer)         fail to get lock
         ....             mod_timer() &lt;--- puts timer back
         ....                          in list
add_timer(&amp;dst_gc_timer) &lt;--- BUG because timer is in list already.

Bug #62581.


Patch from Kirill:

This patch fixes printk() under zone->lock.

It can be unsafe to icall printk() under this lock, since caller can try to allocate/free some memory and selfdeadlock on this lock. I found allocations/freeing mem both in netconsole and serial console.


Patch from Andrey:

This patch implements safer printk from places like NMI or from under critical locks (runqueue lock and so on). It is a replacement of two dbg-nmi-printk-200508* patches.

Printk from NMI watchdog was left unmodified in this version, since other patches are also fiddling with it. NMI watchdog requires great care and consideration, and is left for future inspection.


Patch from Andrey:

This patch prints more sensible warning on bad refcounter in __put_beancounter.


Patch from Andrey:

This patch fixes an apparent bug in accounting in ub_sock_tcp_chargepage. Should help problems at DefenderHosting.


Patch from Andrey:

Console code passes additional information via global variables. This patch fixes release_console_sem() call with this respect.


Patch from Dmitry:

Fixed oops in inet_sock_destruct due to wrong sk_clone error path. Found by phycho.


Patch from Dmitry:

This patch fixes iowait_time statistics for both VE0 and VEs. Removes redundant nr_iowait field in VE_CPU_STATS.

Bug noticed by Matt Loschert.


Patch from Dmitry:

Fixed iowait stats for VE0 - after schedule task may be activated on the another processor.


Patch from mainstream:
[PATCH] NETFILTER: Fix small information leak in SO_ORIGINAL_DST (CVE-2006-1343)

It appears that sockaddr_in.sin_zero is not zeroed during getsockopt(...SO_ORIGINAL_DST...) operation. This can lead to an information leak (CVE-2006-1343).

Signed-off-by: Marcel Holtmann <>
Signed-off-by: Chris Wright <>



Patch from Kir <>:
fixing a compilation issue with gcc4

The following error occurs when trying to compile 022stab077:

drivers/scsi/qla2xxx/qla_gs.c: In function qla2x00_ga_nxt:
drivers/scsi/qla2xxx/qla_gs.c:97: sorry, unimplemented: inlining failed in call
to qla24xx_prep_ms_iocb: function not considered for inlining
drivers/scsi/qla2xxx/qla_gs.c:61: sorry, unimplemented: called from here

OpenVZ Bug #182.


Patch from Vasily:

Change AS I/O scheduler defaults due to the problem with syncs.

  • read_batch_expire = 10 by default.
  • read_expire = 10 by default. bug #5900

Found by Matt Loschert, ticket #154336.


Patch from Vasily:

in some places we should compare not with &root_user ptr (HN root), but with VPS root. Resulted in inability of su to change user when ulimit was too tight for root.

Found by Barmaley, ticket #158322.


Patch by Andrey (saw@):

This patch fixes assertions in fairsched to avoid printk deadlocks, and to print more information.


Patch from Alexey:
[PATCH] leakage of vpid_mapping

Probably this fixes bug #62834.

The problem was that when switching to sparse VPID mappings, we could have processes with non-virtual pids entered to VE. F.e. it could be some stuck process from VE setup scripts. In this case we created useless mapping struct, which was nevere freed, because it referred to non-virtual pid.

I left a printk() in the code, because we definitely need confirmation that this event really happens. It does not in my tests: to the moment I run 400000 checkpoint/restores and 20000 of migrations on VE and I found no problems, unfortunately.


Patch prepared by Vasily:
Sources were taken from

3w-9xxx driver was updated up to version Fixed a kmap_atomic() problem that might result in data loss under Linux. New driver version disables local IRQs while the driver is holding KM_IRQ0. Thisis to prevent an IRQ handler from using those kmap slots while the driver is using them, which can result in memory corruption.

Bug #38702.