CPU Is High in Production — and Almost Everyone Misreads It

5 min read17 hours ago

–

Press enter or click to view image in full size

“CPU is high in production.”

Every engineer has heard this at least once. And almost every time, it triggers the same reaction: open top, see 300% or 400% CPU, panic a little, maybe restart the service, and hope the problem goes away.

Sometimes it does. Most times, it comes back — louder, harder to explain, and more uncomfortable to debug.

High CPU is one of those problems that feels like a black box:

No exceptions
No stack traces
No obvious failures

Memory looks stable. GC looks quiet. Requests are sti…

5 min read17 hours ago

–

***Authored by: ***Venkatesh Wagh

Press enter or click to view image in full size

“CPU is high in production.”

Sometimes it does. Most times, it comes back — louder, harder to explain, and more uncomfortable to debug.

High CPU is one of those problems that feels like a black box:

No exceptions
No stack traces
No obvious failures

Memory looks stable. GC looks quiet. Requests are still flowing… just slower.

You’ll hear advice like:

“Take a thread dump.”
“Analyze hexadecimal thread IDs.”
“Enable a profiler.”
“Install an APM agent.”

All of that advice is technically correct — and still not very helpful if you don’t know which signal to trust first.

Before dashboards, flame graphs, or fancy tooling, it’s worth stepping back for a moment. Because the CPU is not a single thing. And not everything that looks expensive actually is.

A Simple Program That Eats an Entire CPU Core

Let’s start with something intentionally dumb.

//Program which consumes one core CPU by single thread @GetMapping("/basic-compute") public void compute(){  while(true){   int a = 2;   int b = a*5;  } }

No I/O. No allocations. No blocking calls.

Yet the moment this endpoint is hit, CPU spikes to 100%.

Why?

Because a single runnable thread is executing instructions as fast as the CPU allows. That’s it. No mystery.

This example is important because it exposes a common misconception:

High CPU does not require complexity. It only requires runnable threads.

What “300% CPU” Actually Means

CPU numbers are often misread — especially on multi-core machines.

For application developers, CPU = cores available to your process.

If your VM has 8 cores:

100% CPU = one core fully busy
400% CPU = four cores fully busy
12.5% CPU overall can still mean one hot core

So when you see:

“The app is using 300% CPU”

It does not mean the system is melting down. It means three cores are doing work.

(in linux — run top and then press 1 — you’ll get per core utilization) (in linux — run top and press shift+i — enter irix mode — which gives across cores utilization)

This distinction matters — because capacity problems and bugs look very similar in CPU graphs.

Case Study: Synthetic High CPU Usage

I created a small Spring Boot application with multiple REST APIs that intentionally burn CPU.

Endpoints include:

/basic-compute → ~100% CPU (single core)
/heavy-creation → ~200–300% CPU

The /heavy-creation API generates ~25MB of JSON in memory. The CPU spike comes from:

Object creation
Serialization
Parsing

You can observe this locally using:

top / htop
JConsole
Basic JVM metrics

Project is available here: https://github.com/venkyintech-afk/cpu-spike-synthetic.git

First Observation: Thread Count Doesn’t Change

Press enter or click to view image in full size

Jconsole capture of app post executing /heavy-creation API

After triggering /heavy-creation, CPU jumps — but:

Thread count remains constant (e.g., 31 threads)
No new threads are spawned

This immediately tells us something important:

The CPU spike is not caused by thread creation. Existing threads are simply doing more work.

This is your first trustworthy signal.

Reading `top` Without Lying to Yourself

Let’s look at what top is actually telling us.

Press enter or click to view image in full size

top command output (for host), 201% CPU is for a core

Host-level view

CPU usage shows aggregate utilization across all cores

You might see:

36% user CPU
54% idle CPU

This already tells you:

The machine is not saturated.

Process-level view

Press enter or click to view image in full size

process level htop -pid <pid> on mac (with all 8 cores shown)

On Linux:

top -p <pid>

On macOS:

top -pid <pid>

If you see 200% CPU for your process on an 8-core machine:

That’s roughly 2 cores fully utilized
Across all cores, it’s only ~25%

This is not a crisis. It’s math.

Thread-level view (Linux)

On Linux, this is where things get interesting:

top -H -p <pid>

This shows:

Which threads are consuming CPU
Whether the work is concentrated or spread out

On macOS, thread-level CPU visibility is limited. You’ll rely more on:

jstack
sampling multiple thread dumps
JVM-level tools

Different OS, different constraints.

The Most Important Question to Ask First

Is CPU being consumed by more threads — or by hotter threads?

Thread count increasing? Look for:

Executor leaks
Unbounded async pipelines
Thread-per-request patterns

Thread count stable, CPU rising? Look for:

Tight loops
Serialization/deserialization
Metrics collection
Polling logic
Inefficient batching

This single question narrows the problem space dramatically.

Use tools to improve your visibility and get more facts.

Tooling: Use It With Intent

Toolkits exist for a reason — but they work best after you understand the shape of the problem.

APM tools (AppDynamics, Dynatrace, New Relic) Great for:

Thread-level CPU attribution
Allocation hot paths
Time-based correlation

Thread dumps (**jstack**) Ideal when:

CPU is high but unexplained
You want to identify RUNNABLE threads
You need ground truth

JFR / async profilers Powerful — but only useful once you know what you’re hunting

Tools don’t replace thinking. They amplify it.

Closing Thoughts

High CPU is not mysterious. It’s just often misunderstood.

Most production CPU incidents come down to:

Runnable threads doing exactly what they were coded to do
Misinterpreted metrics
Incorrect mental models about cores and percentages

Once you understand what CPU numbers really represent, debugging becomes far less emotional — and far more mechanical.

Until the next article, Cheers 🥂

Authored by

Venkatesh Wagh

Technology Tinkerer and enthusiast

Link to Bio

A Simple Program That Eats an Entire CPU Core

What “300% CPU” Actually Means

Case Study: Synthetic High CPU Usage

First Observation: Thread Count Doesn’t Change

Reading top Without Lying to Yourself

Host-level view

Process-level view

Thread-level view (Linux)

The Most Important Question to Ask First

Tooling: Use It With Intent

Closing Thoughts

Similar Posts

Reading `top` Without Lying to Yourself