For the Love of God, Think Twice Before Dropping CPU Limits on Kubernetes

31 October 2025
kubernetes

This article is a response to https://home.robusta.dev/blog/stop-using-cpu-limits. It's a good article, it makes a good argument, and you should read it.

Enabling bursting is a good idea #

Let me start off by saying the original article is correct that enabling services to burst above limits can be very powerful, and can reduce the likelyhood of production issues.

When a service receives a sudden load spike can be a good thing to let it consume other available resources on the machine instead of throttling it.

Kubernetes CPU limits are used for more than you think #

Setting a CPU limit in Kubernetes actually boils down to setting a CPU limit in the Linux cgroups for the container. cgroups is a mechanism Linux exposes to let you namespace processes and their resources. It's essentially the technology powering containers on Linux.

A lot of language runtimes have started adapting to the new "era of containers" and have started looking at the settings of their cgroup for defaults

Two examples:

go looks at CPU limits to derive a sensible default for GOMAXPROCS.
The BEAM (elixir, erlang) looks at CPU limits to derive a sensible default for online schedulers.

These runtimes essentially look at the CPU limit to determine how many "worker threads" they should spawn to host language code execution.

If there is no CPU limit they default to the CPU count on the machine.

Why can `threads == CPUs` be a problem? #

As an example, let's look at a container running with 1 CPU request on a 48 core machine. When looking at core count, runtimes with the logic described above will spawn 48 threads to do work.

On its face, this might not sound like a problem. We might only use 1 CPU core, but we also have threads available to do work if it is ever needed. We have the ability to burst to all 48 cores if needed.

Unfortunately extra threads also come with overhead. The Linux kernel will try to give all 48 threads a share of the CPU core. This can lead to a lot of context switching.

Context switching, when it happens often, can have a large overhead. Often this overhead is not easy to spot directly on CPU graphs or profiles. You get a silent "constant background drag" on the performance of your applications.

This is all made even worse in the (common) case that multiple containers run on a single host, all with the same worker pool configuration. You might end up with hundreds of threads fighting for a few cores.

Here is an article with some more information on the overhead of thread vs process context switching: https://medium.com/@gtamilarasan/context-switching-performance-threads-vs-processes-6a1b5d2c9954

Find a sweet spot #

You should probably identify a sweet spot. Set the worker thread count at or above above CPU requests, below the total CPU count. Specifics depend on your workload and app.

For languages which use green threads hosted on workers the whether you set the CPU limit in Kubernetes or not usually might not matter much, as long as you scale the worker thread pool correctly.

If you want to enable bursting, make sure you set the pool size some amount above the CPU requests.

... or be clever about it #

Another option might be to attempt to scale the worker pool size dynamically. Erlang has options to scale the number of online schedulers up and down on the fly. It should be possible to scale this up and down dynamically depending on the current load condition.

That's all very complicated though, it sounds tricky to get right. You end up with a dynamic system which might be difficult to tune correctly.

Previous: Reading Prices in Grocery Stores using AI