- Bugs fixed:
- numerous IPv6 leaks
- OOPSs in dcache
- hangups in TCP
- Major improvements:
- VZDQ now works with NFS
- Improved CFQ single-reader case (and probably performance)
- Migration from 2.6.9 to this kernel improvements
- Binfmt misc virtualization
- Bells and whistles:
- va_space randomization per-CT control
- virtual icmpmsg stats (no CPT yet)
- CFQ statistics for debugging purposes
- No new issues
Same as in 028stab053.4, plus:
- +CONFIG_IA64_HP_AML_NFW=y (IA64 only)
The updated kernel includes fixes for the following security vulnerabilities which were fixed in 2.6.18-53.1.21.el5 - 2.6.18-92.1.1.el5 Red Hat kernels:
- Race condition in the ptrace and utrace support in the Linux kernel allowed local users to cause a denial of service (kernel crash) via a long series of PTRACE_ATTACH ptrace calls to another user's process. (CVE-2008-2365, important)
- On AMD64 architectures, the possibility of a kernel crash was discovered by testing the Linux kernel process-trace ability. This could allow a local unprivileged user to cause a denial of service (kernel crash). (CVE-2008-1615, Important)
- On 64-bit architectures, the possibility of a timer-expiration value overflow was found in the Linux kernel high-resolution timers functionality, hrtimer. This could allow a local unprivileged user to set a large interval value, forcing the timer expiry value to become negative, causing a denial of service (kernel hang). (CVE-2007-6712, Important)
- The possibility of a kernel crash was found in the Linux kernel IPsec protocol implementation, due to an improper handling of fragmented ESP packets. When an attacker controlling an intermediate router fragmented these packets into very small pieces, it would cause a kernel crash on the receiving node during the packets reassembly. (CVE-2007-6282, Important)
- A potential denial of service attack was discovered in the Linux kernel PWC USB video driver. A local unprivileged user could use this flaw to bring the kernel USB subsystem into the busy-waiting state, causing a denial of service. (CVE-2007-5093, Low)
The updated kernel includes fixes for the following issues:
- [ia64]: Suspending a container could fail due to an incorrect handling of the execve() error code on the ia64 architecture.
- [CPT]: An online migration could fail if a process inside a container being migrated used inotify events on a symlink. The online migration of such a container could terminate numerous processes on the source Node (by means of the SIGTERM signal) and fail with the following message: "
CPT ERR: ffff81004a07a000,1024 :rst_inotify: -22".
- [CPT]: Requests for opening a socket could be restored incorrectly during the online migration.
- [CPT]: UDP sockets could be bound to a wrong port after the online migration.
- [CPT]: A kernel crash could happen during an online migration if the container being migrated contained a process that had a big file (>2Gb) opened for write only and that file had been already deleted from the filesystem.
- The CPU time could be distributed unfairly (not according to the
CPUUNITSparameters) in case the Hardware Node ran a few containers only.
- Modern ccNUMA AMD servers could run with degraded performance due to architecture-specific latencies.
- The modification time of memory mapped files was not updated in time, which could lead to skipping such files during an incremental backup. This issue concerned particularly the containers running the IBM DB2 software.
- The xinetd service failed to start inside a SLES-based container due to the inability to check the status of a
/proc/<PID>/exeentry for a zombie process. The failure was accompanied with the following message:
"Starting INET services. (xinetd)startproc: cannot stat /proc/1432/exe: Permission denied failed".
- An application could fail to allocate memory due to an incorrect heap rlimit calculation in case the
randomize_va_spacesysctl was enabled.
- [ppc64]: A kernel crash could occur on a container start due to a missed page table entry memory allocation check.
- The /proc/user_beancounters permissions were shown incorrectly as "
r--r--r--" for a file that was readable by the root user only.
sys.ipv4.conf.defaultsysctl did not have any affect inside a container.
- A kernel crash could occur if a container was started before the conntrack modules were loaded and 'iptstate' was executed inside the container.
/proc/statreported the non-virtualized btime (boot time), which sometimes confused the tools that used that value to calculate process times.
- Chkrootkit produced false alerts about "hidden" processes inside a container.
vzlistutility did not work in case the venet module was not loaded.
/var/log/croninside a container contained the following audit error messages:
"crond: System error crond: CRON (root) ERROR: failed to open PAM security session: Connection refused crond: CRON (root) ERROR: cannot set security context".
- Writing data in parallel into several memory mapped files located on an NFS partition could result in data corruption.
- An unsuccessful attempt to stop a container could lead to a socket leakage followed by never-ending messages:
"unregister_netdevice: waiting for lo to become free. Usage count = 3".
- I/O priorities did not work well if all the containers ran each only one process that actively used the disk subsystem.
- There could appear processes consuming 100% of the CPU if the "
tcpsndbuf" limit was exceeded. The processes broke busy loops if a signal was sent to them, for example, if there was an attempt to strace the process.
- The traffic accounting statistics could not be reset without a Hardware Node restart.
kernel.ve_allow_kthreadssysctl's could be invisible in the
/proc/sys/kernel/directory in case someone accessed
/proc/sys/kernelbefore OpenVZ containers started.
- The I/O statistics available via
/proc/bc/CTID/ioacctcould report more "read" bytes than were actually read by the container.
- The quota tools inside a 32-bit container based on old templates (e.g. redhat-as3) and running on a 64-bit Hardware Node could report incorrect values.
- [NFS]: A directory listing on an NFS partition took an extremely long time to complete in case there were other processes writing to the same directory.
- A kernel crash could happen in
do_uncharge_dcache()while turning on the precise dcache accounting.
- Some applications could crash inside a container based on the RedHat 7.3 template because they were not aware of the kernel address space randomization feature. The
kernel.randomize_va_spacesysctl has been virtualized to providing the ability to switch off this feature for affected containers.
Besides, the new kernel includes the following improvements:
- The kernel has been re-based on the 2.6.18-92.1.1.el5 Red Hat kernel.
- [CPT]: The checkpointing code has been enhanced to support an iterative online migration of shared memory.
- [CPT]: A check for the required iptables modules being loaded on the destination Node has been added to the migration code along with a proper error message. Before this enhancement, the online migration failed for lack of certain iptables modules with the following message:
"CPT ERR: ffff810020153000,250 :iptables-restore exited with 1".
- [CPT]: A check for 'slm_dmprst' being loaded on both the source and destination Nodes has been added to the migration code along with a proper error message. Before this enhancement, the online migration failed for lack of this module with the following message:
"vzctl : Can't undump: Channel number out of range".
binfmt_misccapability has been virtualized, which allows to install Sun Java 1.6.0 without the failure of the postinstall script to configure the
binfmt_miscwrapper inside a container.
- The sysfs '
mem' class and some of its devices (
urandom) has been virtualized, which allows to run '
udevd' inside a container based on the Ubuntu 8.04 template.
- An empty
/proc/devicesfile has been added to a container to avoid
/sbin/MAKEDEV's warning: "
can't read /proc/devices".
- The NFSv2 support has been disabled in favor of NFSv3.
The following bugs from the previous release have been fixed in the new kernel:
- #99018: [ia64]: execve() returns positive error codes on ia64 arch.
- #96464: [CPT]: inotify on symlinks should be restored after online migration.
- #95113: [CPT]: open socket requests are not restored correctly after an online migration.
- #99542: [CPT]: temporary files should be created with O_LARGEFILE flag during checkpointing and restore process.
- #93544: CPUUNITS parameter influence is very weak in case only a few containers are on the Hardware Node.
- #98868: Modern ccNUMA AMD servers do not perform as expected.
- #82009: The kernel mistakenly returns -EACCESS on accessing a /proc/<pid>/<any> symlink for a zombie process instead of -ENOENT.
- #99599: binfmt_misc capability should be virtualized.
- #114887: /proc/stat reports non-virtualized btime.
- #99897: 'udevd' does not start inside a container based on Ubuntu 8.04.
- #112588: Asynchronous audit netlink message handling produces errors during PAM authorization.
- #114565: Data corruption on mmaped file over NFS filesystem.
- #75822: Raw sockets leak leads to unregister_netdevice() failure.
- #114720: NFSv2 support should be disabled.
- #98276: I/O priorities do not work well for single readers.
- #112103: An endless loop is possible while waiting for TCPSNDBUF memory if timeout is not specified.
- #111468: A memory leak in venet_acct_set_base() leads to inability to reset traffic network statistics.
- #112482: "kernel.vzprivrange" and "kernel.ve_allow_kthreads" are invisible in /proc/sys/kernel/.
- #111808: Value too high for "read" bytes in I/O accounting statistics.
- #95952: [CPT]: diagnostics in case of iptables-restore failure should be enhanced.
- #114312: [CPT]: A check if 'slm_dmprst' module is loaded should be added.
- #115752: Quota v2 (old) structures are not 32bit emulation aware.
- #116274: [NFS]: nfs_getattr() hang during heavy write workloads.
- #116095: A kernel crash in do_uncharge_dcache().
- #114847: /sbin/MAKEDEV: warning: can't read /proc/devices.
- #115336: kernel.randomize_va_space sysctl should be virtualized.
The following OpenVZ bugs have been fixed:
- #784: [CPT]: UDP sockets can be restored incorrectly after online migration.
- #491: Incorrent heap rlimit calculation caused by a bug in exec shield code.
- #680: [ppc64]: The return code from
do_pte_alloc()is not checked.
- #782: /proc/user_bean_counters permissions should be reported as "
- #826: Sysctl "
sys.ipv4.conf.default" does not work inside a container.
- #788: An oops in netlink conntrack module if conntrack modules were loaded after the container start.
- #828: /proc/stat reports non-virtualized btime.
- #736: getpriority() syscall should not work with 'real' pids if called from inside a container.
- #394: /proc/vz/veinfo should be available even if 'venet' module is not loaded.
The following references have been used in this document: