I Benchmarked 3 Go Concurrency Patterns. The "Fastest" One Would Destroy Production

I spent this weekend building a benchmark suite to test Go concurrency patterns for healthcare APIs. I ran three different approaches through realistic load tests and collected real performance data.

The results shocked me.

The “naive” pattern—spawning a new goroutine for every request—was 4-5x faster than the “proper” worker pool pattern. It handled 11,000 requests per second while the worker pool managed only 2,100.

My first thought: “Did I break something?”

My second thought: “Wait... this is actually the problem.”

Here’s what I learned about Go concurrency, production systems, and why the fastest solution is often the worst choice.

🔗 Full code and benchmarks: [github.com/Stella-Achar-Oiro/healthcare-api-benchmark](https://github.com/Stella-Achar-Oiro/healthcare…

I spent this weekend building a benchmark suite to test Go concurrency patterns for healthcare APIs. I ran three different approaches through realistic load tests and collected real performance data.

The results shocked me.

My first thought: “Did I break something?”

My second thought: “Wait... this is actually the problem.”

Here’s what I learned about Go concurrency, production systems, and why the fastest solution is often the worst choice.

🔗 Full code and benchmarks: github.com/Stella-Achar-Oiro/healthcare-api-benchmark

Why This Matters (Especially in Healthcare)

Healthcare APIs have unique constraints that make concurrency patterns critical:

Predictability over performance. When a nurse queries patient medication history during an emergency, the system needs to respond consistently. A response that takes 200ms 99% of the time but occasionally spikes to 5 seconds isn’t acceptable. Lives depend on these milliseconds.

Resource control is compliance. HIPAA requires demonstrable control over patient data access. Uncontrolled goroutine spawning makes it nearly impossible to audit resource usage, track access patterns, or enforce rate limits properly.

Graceful degradation over catastrophic failure. A healthcare system that rejects 20% of requests during a spike is better than one that accepts all requests, runs out of memory, and crashes—taking down every connected system with it.

These requirements forced me to dig deeper into Go’s concurrency patterns. What I found applies to any production system where reliability matters more than raw speed.

The Three Patterns I Tested

Pattern 1: Naive (The Anti-Pattern)

The simplest approach: spawn a goroutine for every incoming request.

func (h *NaiveHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
patientID := extractPatientID(r)

// Spawn a new goroutine for EVERY request
go h.processRequest(w, r, patientID)
}

func (h *NaiveHandler) processRequest(w http.ResponseWriter, r *http.Request, patientID string) {
// Query database
patient, err := h.db.QueryPatient(r.Context(), patientID)

// Send response
json.NewEncoder(w).Encode(patient)
}

Why developers write this:

It’s dead simple
Feels “concurrent” and “Go-like”
Works perfectly in demos
Benchmarks show amazing numbers

Why it’s dangerous:

Unbounded concurrency: No limit on goroutine count
Memory bomb: Each goroutine consumes ~2KB+ of stack space
Database exhaustion: Can spawn more goroutines than your database connection pool
Scheduler overhead: The Go scheduler struggles with thousands of competing goroutines
Cascading failures: One slow database query blocks resources for everyone

I’ve seen this pattern in production codebases. Usually written by developers coming from async/await backgrounds who treat goroutines like they’re free. They’re not.

Pattern 2: Worker Pool (The Production Pattern)

Fixed number of workers pulling jobs from a queue.

type WorkerPoolHandler struct {
jobQueue    chan *job
workers     int
wg          sync.WaitGroup
}

func (h *WorkerPoolHandler) worker(id int) {
for {
select {
case job := <-h.jobQueue:
// Process job
patient, err := h.db.QueryPatient(job.ctx, job.patientID)
job.resultChan <- patient
case <-h.ctx.Done():
return
}
}
}

func (h *WorkerPoolHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
job := &job{
ctx:        r.Context(),
patientID:  extractPatientID(r),
resultChan: make(chan *PatientResponse, 1),
}

// Try to enqueue
select {
case h.jobQueue <- job:
// Queued successfully
default:
// Queue full - reject request
http.Error(w, "service overloaded", http.StatusServiceUnavailable)
return
}

// Wait for result
response := <-job.resultChan
json.NewEncoder(w).Encode(response)
}

Why this is better:

Bounded concurrency. You control exactly how many operations run simultaneously. If you set 20 workers, you get 20 goroutines. No surprises.

Graceful backpressure. When the system is overloaded, requests wait in the queue. If the queue fills up, you can reject new requests cleanly with a 503 status code. The client knows to retry later.

Predictable resource usage. You can calculate exactly how much memory the system needs. 20 workers × 2KB/goroutine + queue size × job size = predictable memory footprint.

Better database behavior. With 20 workers, you’ll never open more than 20 database connections simultaneously. Your database connection pool can be sized appropriately.

Proper shutdown. You can wait for in-flight work to complete during deployments. Close the job queue, drain remaining jobs, then shutdown cleanly.

Pattern 3: Optimized (Worker Pool + sync.Pool)

Same worker pool but with object pooling to reduce allocations.

type OptimizedHandler struct {
WorkerPoolHandler
responsePool sync.Pool
}

func (h *OptimizedHandler) getResponse() *PatientResponse {
resp := h.responsePool.Get().(*PatientResponse)

// CRITICAL: Clear previous data (HIPAA compliance!)
resp.Patient = nil
resp.Error = ""

return resp
}

func (h *OptimizedHandler) putResponse(resp *PatientResponse) {
// Clear sensitive data before returning to pool
resp.Patient = nil
h.responsePool.Put(resp)
}

func (h *OptimizedHandler) processJob(job *job) {
// Get pooled response object instead of allocating
response := h.getResponse()
defer h.putResponse(response)

patient, err := h.db.QueryPatient(job.ctx, job.patientID)
response.Patient = patient

job.resultChan <- response
}

What sync.Pool does:

sync.Pool maintains a cache of reusable objects. Instead of allocating a new PatientResponse struct for every request, we reuse them. This reduces:

Heap allocations
Garbage collection pressure
Memory allocation latency
GC pause times

Healthcare-specific note: You MUST clear sensitive data before returning objects to the pool. Imagine pooling a response object that still contains patient data from the previous request. That’s a HIPAA violation waiting to happen.

The Benchmark Setup

I built a load testing tool that simulates realistic healthcare API usage:

Database simulation: 50-100ms query latency (typical for indexed patient lookups) with a 5% random error rate (network timeouts, lock conflicts, etc.)

Load patterns:

Low load: 1,000 requests, 100 concurrent clients
High load: 10,000 requests, 1,000 concurrent clients
Sustained load: 50,000 requests, 500 concurrent clients

Metrics collected:

Throughput (requests/second)
Latency (min, mean, median, P95, P99, max)
Error rates
Memory allocations

Worker pool configuration: Intentionally small (20 workers, 100 job queue) to demonstrate behavior under pressure.

Test 1: Low Load (1,000 Requests)

The first test: 1,000 total requests with 100 concurrent clients.

Pattern	Throughput	Mean Latency	P99 Latency	Error Rate
Naive	1,115 req/s	75.87ms	100.06ms	5.80%
Worker Pool	260 req/s	364.94ms	423.70ms	5.30%
Optimized	261 req/s	363.06ms	417.51ms	4.70%

Naive wins. And it’s not even close—4.3x higher throughput!

At this load, the naive pattern shines. It spawned ~100 goroutines to handle 100 concurrent clients. The system had plenty of resources, so everything just... worked.

The worker pools were slower because only 20 requests could process simultaneously. The other 80 waited in the queue. Queue wait time added latency.

But here’s what the numbers don’t show: The naive pattern used 5x more goroutines and had no mechanism to reject requests if the system was overloaded.

Test 2: High Load (10,000 Requests)

Now let’s crank it up: 10,000 requests with 1,000 concurrent clients.

Pattern	Throughput	Mean Latency	P99 Latency	Error Rate	Goroutines
Naive	11,104 req/s	75.47ms	100.04ms	5.24%	~1,000
Worker Pool	2,124 req/s	148.47ms	568.99ms	88.29%	20
Optimized	2,248 req/s	144.95ms	565.86ms	88.75%	20

Wait... what?

The naive pattern got faster (10x higher throughput than low load). The worker pools had 88% error rates. Did I break something?

Nope. This is exactly what should happen.

Understanding the “Errors” (Plot Twist: They’re Features)

Those 88% “errors” in the worker pool patterns aren’t failures. They’re rejections.

What happened:

1,000 concurrent clients started sending requests
Worker pool: 20 workers, 100 job queue = 120 requests can be “in flight”
Request #121 arrives → queue is full → rejected immediately
Result: 880 out of 1,000 requests rejected = 88% “error” rate

This is not a bug. This is the feature.

The worker pool said: “I can’t handle this load safely. I’m returning 503 Service Unavailable. Please retry later.”

The naive pattern said: “Sure! I’ll spawn 1,000 goroutines to handle all of this!”

In production, which behavior do you want?

The Hidden Costs of “Winning”

The naive pattern handled all 10,000 requests and maintained low latency. But at what cost?

Goroutine count: 1,000 concurrent goroutines

Memory usage: ~2-4 MB just for goroutine stacks

Database connections: Potentially 1,000 simultaneous queries

Context switches: The Go scheduler juggling 1,000 competing goroutines

In my benchmark environment, this worked fine. The system had resources to spare.

But in production:

Scenario 1: Traffic Spike

Black Friday sale. 10,000 concurrent users. Naive pattern spawns 10,000 goroutines. Your 100-connection database pool is exhausted in milliseconds. Connections time out. Requests fail. Users see errors.

Scenario 2: Slow Query

A poorly-optimized patient search takes 30 seconds instead of 100ms. With naive pattern, each slow request holds a goroutine for 30 seconds. Soon you have 10,000 goroutines waiting. Memory exhausted. OOMKilled. Service down.

Scenario 3: Cascading Failure

Your API spawns 5,000 goroutines. Uses all database connections. Database starts rejecting new connections. Other services sharing that database start failing. Now your patient monitoring system is down. Your lab results service is down. Everything crashes because one service didn’t implement backpressure.

Worker pools prevent all of these scenarios.

Test 3: Sustained Load (50,000 Requests)

One more test: 50,000 requests with 500 concurrent clients over ~30 seconds.

Pattern	Throughput	Mean Latency	Error Rate	Notes
Naive	6,179 req/s	75.75ms	4.92%	500 goroutines
Worker Pool	1,593 req/s	164.50ms	84.50%	Steady rejection
Optimized	1,547 req/s	167.09ms	83.95%	Steady rejection

Same pattern. Naive “wins” on throughput by spawning 500 goroutines. Worker pools maintain bounded concurrency by rejecting excess load.

sync.Pool effectiveness: 51% hit rate. About half the response objects came from the pool instead of fresh allocations. In a sustained production workload, this would significantly reduce GC pressure.

The Configuration Question

“But you only used 20 workers! What if you used 100 or 500?”

Exactly.

Worker pools require configuration. You need to choose:

Number of workers
Queue size
Timeout behavior
Rejection strategy

I intentionally undersized the pool to demonstrate the behavior. In production, you’d size it based on:

For I/O-bound work (database queries):

workers = num_cpu_cores × 2 to 4
queue_size = workers × 5

For CPU-bound work:

workers = num_cpu_cores
queue_size = workers × 2

For our healthcare API: With 8 cores, I’d start with 30 workers and a queue of 150. Monitor queue depth in production. Adjust based on observed traffic patterns.

The naive pattern has no knobs to turn. It just spawns goroutines until something breaks.

When to Use Each Pattern

Never Use Naive in Production

I’m saying this explicitly: Don’t use the naive pattern in production systems.

“But it’s so much faster!”

It’s faster until it’s not. And when it fails, it fails catastrophically.

The only acceptable use cases:

Prototypes and demos
Scripts that process finite, small datasets
CLI tools with known upper bounds

Use Worker Pools for Most Services

Worker pools should be your default for any production service:

REST APIs
Background job processors
Message queue consumers
gRPC services
WebSocket handlers

The configuration overhead is worth it for:

Predictable resource usage
Graceful degradation
Proper monitoring
Clean shutdown behavior

⚡ Use Optimized Pattern for High-Throughput Services

Add sync.Pool when:

You’re processing >1,000 requests/second
P99 latency matters for SLAs
You have hot objects allocated repeatedly
Profiling shows allocation pressure

Healthcare-specific guidance:

Patient-facing APIs → Optimized pattern
Internal admin tools → Worker pool
Batch reporting jobs → Worker pool with larger queue
Real-time monitoring → Optimized pattern

Key Takeaways

1. Raw throughput isn’t everything.

The naive pattern had 4-5x higher throughput but spawned 1,000 goroutines. In production, that’s a memory bomb waiting to explode.

2. Errors can be features.

The 88% “error rate” was the system correctly rejecting load it couldn’t handle safely. Better to reject requests than crash the entire service.

3. Configuration is control.

Worker pools require tuning. That’s not a weakness—it’s a feature. You get to decide your system’s limits instead of discovering them during an outage.

4. Test under realistic load.

At low load, the naive pattern looked perfect. At high load, the problems became obvious. Always benchmark at the scale you expect in production.

5. Healthcare systems need predictability.

Fast response times with unpredictable spikes are worse than consistent, controlled behavior. Patient care depends on reliability.

6. sync.Pool isn’t magic.

My 51% hit rate shows the pool was working but not optimal. In sustained production load with proper warmup, you’d see 80-90% hit rates. But it’s not worth the complexity unless you’re handling high throughput.

7. The Go scheduler is amazing... to a point.

Go can handle thousands of goroutines efficiently. But “can handle” doesn’t mean “should use.” Bounded concurrency is still the right pattern for production systems.

Try It Yourself

All the code, benchmarks, and documentation are open source:

Repository: github.com/Stella-Achar-Oiro/healthcare-api-benchmark

Run the benchmarks on your machine:

git clone https://github.com/Stella-Achar-Oiro/healthcare-api-benchmark.git
cd healthcare-api-benchmark
go run cmd/loadtest/main.go -requests=10000 -concurrency=1000

Try different configurations:

# Test with more workers
go run cmd/loadtest/main.go -workers=100 -queue-size=500

# Test with different load patterns
go run cmd/loadtest/main.go -requests=50000 -concurrency=500

Final Thoughts

Building this benchmark taught me that engineering isn’t about finding the “best” solution. It’s about understanding tradeoffs.

The naive pattern isn’t wrong because it’s slow—it’s wrong because it’s unpredictable. The worker pool pattern isn’t better because it’s faster—it’s better because you control what happens under load.

In healthcare technology, and in production systems generally, predictable failure beats unpredictable catastrophe every single time.

Choose your concurrency pattern based on what you need your system to do when things go wrong. Because in production, things always go wrong.

What concurrency patterns do you use? Have you been burned by unlimited goroutine spawning? Drop a comment below—I’d love to hear your war stories.

If you found this useful, follow me for more Go, Kubernetes, and healthcare tech content. I’m building in public and sharing what I learn.

Stella Achar Oiro is a software developer combining clinical healthcare experience with cloud-native development. She builds healthcare technology with Go, Kubernetes, and AWS, and writes about the intersection of medicine and code.

Connect: GitHub | LinkedIn | DEV.to