Platform Warm-Up is Real: Let it Stretch, Don’t Unleash All Customer Traffic at Once

5 min readJust now

–

Press enter or click to view image in full size

Image Generated with AI Did you know your platform needs to warm up before handling high-traffic volumes? It does, and not allowing it to warm up can cause more problems than you think.

🤔 What is Platform Warm-Up Time

If you follow me on LinkedIn or listen to my talks, you’ve heard me mention warm-up time in the context of just-in-time software compilation.

Warm-up is the time it takes for your code to be compiled to machine code on startup. During the Just-In-Time compilation process, the service is slower than when the process is complete, often significantly slower.

This initial time is often referred to as warm-up time, as if the service is stretching and getting ready for a big workout.

Warming u…

5 min readJust now

–

Press enter or click to view image in full size

Image Generated with AI Did you know your platform needs to warm up before handling high-traffic volumes? It does, and not allowing it to warm up can cause more problems than you think.

🤔 What is Platform Warm-Up Time

If you follow me on LinkedIn or listen to my talks, you’ve heard me mention warm-up time in the context of just-in-time software compilation.

This initial time is often referred to as warm-up time, as if the service is stretching and getting ready for a big workout.

Warming up at a platform level (multiple services, subsystems, etc.) is similar in that it occurs when a platform is slower after being restarted; however, it differs in its underlying cause.

👨‍🏫 How a Platform Warms Up

Warm-up at a platform level occurs when you start the platform fresh, during system maintenance, software updates, or any other event that restarts large chunks of the platform.

When you restart a system, not only does any Just-In-Time compiled software need to spend time converting code to machine code, but any in-memory caches, upstream and downstream connections, and other resources are also reset to their initial state.

When we build systems, we often incorporate optimizations for performance at scale, such as the ability to reuse HTTP connections, utilize caches, and establish connection pools.

But these are all designed to build up as the system is used. When a request first lands on the system after a restart, these optimizations must start from scratch.

Let’s walk through a couple of examples where platform warm-up comes into play.

Lazy Loaded Caching🦥

Lazy Loading a cache occurs when you need data, you fetch it from the source, whether that’s an HTTP service or a database, and then store the results in the cache for any subsequent requests for the same data.

It’s highly efficient and straightforward to implement.

However, when the first request is processed, it incurs the performance penalty of fetching data from the source and loading it into the cache.

Every subsequent request would take advantage of the cache, but the first always suffers from lazy loading.

Reusing Connections🔁

Most systems today rely on another system to process requests; this is often true not only for microservices-based platforms but also for traditionally service-oriented architecture-based monoliths.

When these systems communicate, they often do so over HTTP/gRPC connections, which can be reused as an optimization.

However, these connections are typically not opened until needed, such as when the first request arrives on the system.

Reusing connections is a great way to reduce processing time, as each connection requires a three-way TCP handshake and an exchange of TLS certificates, keys, and algorithms. Depending on the system’s needs, that connection time can be problematic.

And just like with caching, the first request always takes the hit.

🛠️ Reducing Platform Warm Up

There are ways to reduce but not eliminate platform warm-up time.

Smoke Tests💨

Smoke tests are a standard approach to validating a system’s health after a new release.

The basic idea of Smoke Tests is to run test requests against the system to validate its correct functioning.

A side effect of running Smoke Tests is that they often trigger the system to establish connections, build caches, and so on.

So why not use Smoke Tests to warm up the platform without impacting customer traffic?

The idea is to let the test requests take the performance penalty, not real customer traffic.

The key to this approach is running sufficient traffic through these Smoke Tests to trigger the warm-ups across multiple pods or services within the platform instance.

Health Checks🩺

HTTP/gRPC connections to dependent services are often a source of warm-up time, and setting up health checks is a straightforward way to reduce it.

If you perform health checks and use the same connections for customer requests, you can reuse the connections and avoid warm-up time, as long as you wait until the health checks have passed.

This approach is practical even when using a service mesh; you must ensure that the same connections are reused for both health checks and requests.

Pre-Fetch Caches🎣

Rather than lazy loading your cache, where you fetch data from the source as needed and store the results, you can pre-fetch and store the data.

Pre-fetching data doesn’t work for every use case, but when it does, it drastically reduces warm-up time by increasing start-up time.

When the system boots, it loads data into the cache, meaning the first request will utilize a cached result.

It is excellent for performance, but it adds complexity to the system.

One of those complexities is handling scenarios where you cannot store the results in a cache (if it is broken). With lazy loading, requests are slower if the cache is down, but they still work.

Canary🐤

One of the most effective ways to mitigate the impact of platform warm-up time is to reintroduce only a portion of customer traffic initially.

The idea behind Canary is that only a small portion of traffic can be routed back to the restarted instance after maintenance is performed. That way, only a small portion of traffic will be impacted if anything doesn’t work right.

With Canary, you can use that small portion of traffic to warm up the system. Rather than every request taking a performance hit, only a small portion of requests take the hit.

🧐 Final Thoughts

While platform warm-up might seem like a problem that only exists when you perform a complete restart of a system, it’s more intrusive than you think.

Autoscaling is an effective way to optimize platform costs while managing demand. The concept of autoscaling is that a new instance is created when needed.

That new instance needs to perform its warm-up routine. Without a strategy to pre-warm up those new instances, portions of your traffic can easily take a performance hit.

That may be ok based on your use case, but it might not be as well.

🤔** What is Platform Warm-Up Time**

🤔** What is Platform Warm-Up Time**

👨‍🏫** How a Platform Warms Up**

Lazy Loaded Caching🦥

Reusing Connections🔁

🛠️** Reducing Platform Warm Up**

Smoke Tests💨

Health Checks🩺

Pre-Fetch Caches🎣

Canary🐤

🧐 Final Thoughts

Similar Posts

🤔 What is Platform Warm-Up Time

🤔 What is Platform Warm-Up Time

👨‍🏫 How a Platform Warms Up

🛠️ Reducing Platform Warm Up