Tcache

Warning: This article describes Virtuozzo/OpenVZ version 7.

Brief tech explanation

Transcendent file cache (tcache) is a driver for cleancache, which stores reclaimed pages in memory unmodified.
Its purpose it to adopt pages evicted from a memory cgroup on local pressure (inside a Container), so that they can be fetched back later without costly disk accesses.

Detailed user-level explanation

Tcache is intended to increase the overall Hardware Node performance only on undercommitted Nodes, i.e. where total sum of all Containers memory limit values placed on the Node is less than Hardware Node RAM size.

Example usecase description

You have a Node with 1Tb of RAM, you run 500 Containers on it limited by 1Gb of memory each (no swap for simplicity).
Let's consider Container to be more or less identical, similar load, similar activity inside.
=> normally those Containers should use 500Gb of physical RAM at max, and 500Gb will be just free on the Node.

You think it's simple situation - OK, the node is underloaded, let's put more Containers there, but that's not always true -
it depends on what is the bottleneck on the Node, which depends on real workload of Containers running on the Node.
But most often in real life - the disk becomes the bottleneck first, not the RAM, not the CPU.

Let's assume all those Containers run, say, cPanel, which by default collect some stats every, say,
15 minutes - the stat collection process is run via crontab.

Note: randomizing times of crontab jobs - is a good idea, but who usually does this for Containers?

We did it for application templates we shipped in Virtuozzo, but lot of software is just installed and configured inside Containers, we cannot do this.
And often Hosting Providers are not allowed to touch data in Containers - so most often cron jobs are not randomized.

OK, it does not matter how, but let's assume we get such a workload - every, say, 15 minutes (it's important that data access it quite rare),
each Container accesses many small files, let it be

just 100 small files to gather stats and save it somewhere.
In 500 Containers. Simultaneously.
In parallel with other regular i/o workload.
On HDDs.

It's nightmare for disk subsystem, if an HDD provides 100 IOPS, it will take 50000/100/60 = 8.(3) minutes(!) to handle.
OK, there could be RAID, let it is able to handle 300 IOPS, it results in 2.(7) minutes, and we forgot about other regular i/o,
so it means every 15 minutes, the Node became almost unresponsive for several minutes until it handles all that random i/o generated by stats collection.

But why every 15 minutes? The file read is performed once and the file content resides in the Container pagecache!
That's true, but here comes 15 minutes period. The larger period - the worse.
If a Container is active enough, it just reads more and more files - website data, pictures, video clips, files of a fileserver, etc.
The thing is in 15 minutes it's quite possible a Container reads more than its RAM limit (remember - only 1Gb in our case!), and thus all old pagecache is dropped, substituted with the fresh one.
And thus in 15 minutes it's quite possible you'll have to read all those 100 files in each Container from disk.

tcache saves our lifes

And here comes tcache to save us: let's don't completely drop pagecache which is reclaimed from a Container (on local reclaim),
but save this pagecache in a special cache (tcache) on the Host in case there is free RAM on the Host.

And in 15 minutes when all Containers start to access lot of small files again - those files data will be get back into Container pagecache without reading from physical disk -
viola, tcache saves IOPS, no Node stuck anymore.

Q/A section

Q: can a Container be so active (i.e. read so much from disk) that this "useful" pagecache is dropped even from tcache?
A: yes. But tcache extends the "safe" period.

Q: mainstream? LXC/Proxmox?
A: No, it's Virtuozzo/OpenVZ specific.
   "cleancache" - the base for tcache it in mainstream, it's used for Xen.
   But we (VZ) wrote a driver for it and use it for Containers as well.

Q: I use SSD, not HDD, does tcache help me?
A: SSD can provide much more IOPS, thus the Node's performance increase caused by tcache is less significant, but still reading from RAM (tcache is in RAM) is faster than reading from SSD.

Managing tcache

Tcache is enabled for all Containers on a Node by default.

Boot option

To disable tcache at boot time: "tcache.enabled=0" kernel option.

Global on-the-fly switch

tcache can be disabled/enabled using following commands:

echo 'N' > /sys/module/{tcache,tswap}/parameters/active
echo 'Y' > /sys/module/{tcache,tswap}/parameters/active

Per Container switch

To disable tcache on the fly per Container:

echo 1 > /sys/fs/cgroup/memory/machine.slice/$CTID/memory.disable_cleancache

Tcache

Contents

Brief tech explanation

Detailed user-level explanation

Example usecase description

tcache saves our lifes

Q/A section

Managing tcache

Boot option

Global on-the-fly switch

Per Container switch

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Services

Donate

Tools