I spent this weekend building a benchmark suite to test Go concurrency patterns for healthcare APIs. I ran three different approaches through realistic load tests and collected real performance data.
The results shocked me.
The “naive” pattern—spawning a new goroutine for every request—was 4-5x faster than the “proper” worker pool pattern. It handled 11,000 requests per second while the worker pool managed only 2,100.
My first thought: “Did I break something?”
My second thought: “Wait... this is actually the problem.”
Here’s what I learned about Go concurrency, production systems, and why the fastest solution is often the worst choice.
🔗 Full code and benchmarks: [github.com/Stella-Achar-Oiro/healthcare-api-benchmark](https://github.com/Stella-Achar-Oiro/healthcare…
I spent this weekend building a benchmark suite to test Go concurrency patterns for healthcare APIs. I ran three different approaches through realistic load tests and collected real performance data.
The results shocked me.
The “naive” pattern—spawning a new goroutine for every request—was 4-5x faster than the “proper” worker pool pattern. It handled 11,000 requests per second while the worker pool managed only 2,100.
My first thought: “Did I break something?”
My second thought: “Wait... this is actually the problem.”
Here’s what I learned about Go concurrency, production systems, and why the fastest solution is often the worst choice.
🔗 Full code and benchmarks: github.com/Stella-Achar-Oiro/healthcare-api-benchmark
Why This Matters (Especially in Healthcare)
Healthcare APIs have unique constraints that make concurrency patterns critical:
Predictability over performance. When a nurse queries patient medication history during an emergency, the system needs to respond consistently. A response that takes 200ms 99% of the time but occasionally spikes to 5 seconds isn’t acceptable. Lives depend on these milliseconds.
Resource control is compliance. HIPAA requires demonstrable control over patient data access. Uncontrolled goroutine spawning makes it nearly impossible to audit resource usage, track access patterns, or enforce rate limits properly.
Graceful degradation over catastrophic failure. A healthcare system that rejects 20% of requests during a spike is better than one that accepts all requests, runs out of memory, and crashes—taking down every connected system with it.
These requirements forced me to dig deeper into Go’s concurrency patterns. What I found applies to any production system where reliability matters more than raw speed.
The Three Patterns I Tested
Pattern 1: Naive (The Anti-Pattern)
The simplest approach: spawn a goroutine for every incoming request.
func (h *NaiveHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
patientID := extractPatientID(r)
// Spawn a new goroutine for EVERY request
go h.processRequest(w, r, patientID)
}
func (h *NaiveHandler) processRequest(w http.ResponseWriter, r *http.Request, patientID string) {
// Query database
patient, err := h.db.QueryPatient(r.Context(), patientID)
// Send response
json.NewEncoder(w).Encode(patient)
}
Why developers write this:
- It’s dead simple
 - Feels “concurrent” and “Go-like”
 - Works perfectly in demos
 - Benchmarks show amazing numbers
 
Why it’s dangerous:
- Unbounded concurrency: No limit on goroutine count
 - Memory bomb: Each goroutine consumes ~2KB+ of stack space
 - Database exhaustion: Can spawn more goroutines than your database connection pool
 - Scheduler overhead: The Go scheduler struggles with thousands of competing goroutines
 - Cascading failures: One slow database query blocks resources for everyone
 
I’ve seen this pattern in production codebases. Usually written by developers coming from async/await backgrounds who treat goroutines like they’re free. They’re not.
Pattern 2: Worker Pool (The Production Pattern)
Fixed number of workers pulling jobs from a queue.
type WorkerPoolHandler struct {
jobQueue    chan *job
workers     int
wg          sync.WaitGroup
}
func (h *WorkerPoolHandler) worker(id int) {
for {
select {
case job := <-h.jobQueue:
// Process job
patient, err := h.db.QueryPatient(job.ctx, job.patientID)
job.resultChan <- patient
case <-h.ctx.Done():
return
}
}
}
func (h *WorkerPoolHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
job := &job{
ctx:        r.Context(),
patientID:  extractPatientID(r),
resultChan: make(chan *PatientResponse, 1),
}
// Try to enqueue
select {
case h.jobQueue <- job:
// Queued successfully
default:
// Queue full - reject request
http.Error(w, "service overloaded", http.StatusServiceUnavailable)
return
}
// Wait for result
response := <-job.resultChan
json.NewEncoder(w).Encode(response)
}
Why this is better:
Bounded concurrency. You control exactly how many operations run simultaneously. If you set 20 workers, you get 20 goroutines. No surprises.
Graceful backpressure. When the system is overloaded, requests wait in the queue. If the queue fills up, you can reject new requests cleanly with a 503 status code. The client knows to retry later.
Predictable resource usage. You can calculate exactly how much memory the system needs. 20 workers × 2KB/goroutine + queue size × job size = predictable memory footprint.
Better database behavior. With 20 workers, you’ll never open more than 20 database connections simultaneously. Your database connection pool can be sized appropriately.
Proper shutdown. You can wait for in-flight work to complete during deployments. Close the job queue, drain remaining jobs, then shutdown cleanly.
Pattern 3: Optimized (Worker Pool + sync.Pool)
Same worker pool but with object pooling to reduce allocations.
type OptimizedHandler struct {
WorkerPoolHandler
responsePool sync.Pool
}
func (h *OptimizedHandler) getResponse() *PatientResponse {
resp := h.responsePool.Get().(*PatientResponse)
// CRITICAL: Clear previous data (HIPAA compliance!)
resp.Patient = nil
resp.Error = ""
return resp
}
func (h *OptimizedHandler) putResponse(resp *PatientResponse) {
// Clear sensitive data before returning to pool
resp.Patient = nil
h.responsePool.Put(resp)
}
func (h *OptimizedHandler) processJob(job *job) {
// Get pooled response object instead of allocating
response := h.getResponse()
defer h.putResponse(response)
patient, err := h.db.QueryPatient(job.ctx, job.patientID)
response.Patient = patient
job.resultChan <- response
}
What sync.Pool does:
sync.Pool maintains a cache of reusable objects. Instead of allocating a new PatientResponse struct for every request, we reuse them. This reduces:
- Heap allocations
 - Garbage collection pressure
 - Memory allocation latency
 - GC pause times
 
Healthcare-specific note: You MUST clear sensitive data before returning objects to the pool. Imagine pooling a response object that still contains patient data from the previous request. That’s a HIPAA violation waiting to happen.
The Benchmark Setup
I built a load testing tool that simulates realistic healthcare API usage:
Database simulation: 50-100ms query latency (typical for indexed patient lookups) with a 5% random error rate (network timeouts, lock conflicts, etc.)
Load patterns:
- Low load: 1,000 requests, 100 concurrent clients
 - High load: 10,000 requests, 1,000 concurrent clients
 - Sustained load: 50,000 requests, 500 concurrent clients
 
Metrics collected:
- Throughput (requests/second)
 - Latency (min, mean, median, P95, P99, max)
 - Error rates
 - Memory allocations
 
Worker pool configuration: Intentionally small (20 workers, 100 job queue) to demonstrate behavior under pressure.
Test 1: Low Load (1,000 Requests)
The first test: 1,000 total requests with 100 concurrent clients.
| Pattern | Throughput | Mean Latency | P99 Latency | Error Rate | 
|---|---|---|---|---|
| Naive | 1,115 req/s | 75.87ms | 100.06ms | 5.80% | 
| Worker Pool | 260 req/s | 364.94ms | 423.70ms | 5.30% | 
| Optimized | 261 req/s | 363.06ms | 417.51ms | 4.70% | 
Naive wins. And it’s not even close—4.3x higher throughput!
At this load, the naive pattern shines. It spawned ~100 goroutines to handle 100 concurrent clients. The system had plenty of resources, so everything just... worked.
The worker pools were slower because only 20 requests could process simultaneously. The other 80 waited in the queue. Queue wait time added latency.
But here’s what the numbers don’t show: The naive pattern used 5x more goroutines and had no mechanism to reject requests if the system was overloaded.
Test 2: High Load (10,000 Requests)
Now let’s crank it up: 10,000 requests with 1,000 concurrent clients.
| Pattern | Throughput | Mean Latency | P99 Latency | Error Rate | Goroutines | 
|---|---|---|---|---|---|
| Naive | 11,104 req/s | 75.47ms | 100.04ms | 5.24% | ~1,000 | 
| Worker Pool | 2,124 req/s | 148.47ms | 568.99ms | 88.29% | 20 | 
| Optimized | 2,248 req/s | 144.95ms | 565.86ms | 88.75% | 20 | 
Wait... what?
The naive pattern got faster (10x higher throughput than low load). The worker pools had 88% error rates. Did I break something?
Nope. This is exactly what should happen.
Understanding the “Errors” (Plot Twist: They’re Features)
Those 88% “errors” in the worker pool patterns aren’t failures. They’re rejections.
What happened:
- 1,000 concurrent clients started sending requests
 - Worker pool: 20 workers, 100 job queue = 120 requests can be “in flight”
 - Request #121 arrives → queue is full → rejected immediately
 - Result: 880 out of 1,000 requests rejected = 88% “error” rate
 
This is not a bug. This is the feature.
The worker pool said: “I can’t handle this load safely. I’m returning 503 Service Unavailable. Please retry later.”
The naive pattern said: “Sure! I’ll spawn 1,000 goroutines to handle all of this!”
In production, which behavior do you want?
The Hidden Costs of “Winning”
The naive pattern handled all 10,000 requests and maintained low latency. But at what cost?
Goroutine count: 1,000 concurrent goroutines
Memory usage: ~2-4 MB just for goroutine stacks
Database connections: Potentially 1,000 simultaneous queries
Context switches: The Go scheduler juggling 1,000 competing goroutines
In my benchmark environment, this worked fine. The system had resources to spare.
But in production:
Scenario 1: Traffic Spike
Black Friday sale. 10,000 concurrent users. Naive pattern spawns 10,000 goroutines. Your 100-connection database pool is exhausted in milliseconds. Connections time out. Requests fail. Users see errors.
Scenario 2: Slow Query
A poorly-optimized patient search takes 30 seconds instead of 100ms. With naive pattern, each slow request holds a goroutine for 30 seconds. Soon you have 10,000 goroutines waiting. Memory exhausted. OOMKilled. Service down.
Scenario 3: Cascading Failure
Your API spawns 5,000 goroutines. Uses all database connections. Database starts rejecting new connections. Other services sharing that database start failing. Now your patient monitoring system is down. Your lab results service is down. Everything crashes because one service didn’t implement backpressure.
Worker pools prevent all of these scenarios.
Test 3: Sustained Load (50,000 Requests)
One more test: 50,000 requests with 500 concurrent clients over ~30 seconds.
| Pattern | Throughput | Mean Latency | Error Rate | Notes | 
|---|---|---|---|---|
| Naive | 6,179 req/s | 75.75ms | 4.92% | 500 goroutines | 
| Worker Pool | 1,593 req/s | 164.50ms | 84.50% | Steady rejection | 
| Optimized | 1,547 req/s | 167.09ms | 83.95% | Steady rejection | 
Same pattern. Naive “wins” on throughput by spawning 500 goroutines. Worker pools maintain bounded concurrency by rejecting excess load.
sync.Pool effectiveness: 51% hit rate. About half the response objects came from the pool instead of fresh allocations. In a sustained production workload, this would significantly reduce GC pressure.
The Configuration Question
“But you only used 20 workers! What if you used 100 or 500?”
Exactly.
Worker pools require configuration. You need to choose:
- Number of workers
 - Queue size
 - Timeout behavior
 - Rejection strategy
 
I intentionally undersized the pool to demonstrate the behavior. In production, you’d size it based on:
For I/O-bound work (database queries):
workers = num_cpu_cores × 2 to 4
queue_size = workers × 5
For CPU-bound work:
workers = num_cpu_cores
queue_size = workers × 2
For our healthcare API: With 8 cores, I’d start with 30 workers and a queue of 150. Monitor queue depth in production. Adjust based on observed traffic patterns.
The naive pattern has no knobs to turn. It just spawns goroutines until something breaks.
When to Use Each Pattern
Never Use Naive in Production
I’m saying this explicitly: Don’t use the naive pattern in production systems.
“But it’s so much faster!”
It’s faster until it’s not. And when it fails, it fails catastrophically.
The only acceptable use cases:
- Prototypes and demos
 - Scripts that process finite, small datasets
 - CLI tools with known upper bounds
 
Use Worker Pools for Most Services
Worker pools should be your default for any production service:
- REST APIs
 - Background job processors
 - Message queue consumers
 - gRPC services
 - WebSocket handlers
 
The configuration overhead is worth it for:
- Predictable resource usage
 - Graceful degradation
 - Proper monitoring
 - Clean shutdown behavior
 
⚡ Use Optimized Pattern for High-Throughput Services
Add sync.Pool when:
- You’re processing >1,000 requests/second
 - P99 latency matters for SLAs
 - You have hot objects allocated repeatedly
 - Profiling shows allocation pressure
 
Healthcare-specific guidance:
- Patient-facing APIs → Optimized pattern
 - Internal admin tools → Worker pool
 - Batch reporting jobs → Worker pool with larger queue
 - Real-time monitoring → Optimized pattern
 
Key Takeaways
1. Raw throughput isn’t everything.
The naive pattern had 4-5x higher throughput but spawned 1,000 goroutines. In production, that’s a memory bomb waiting to explode.
2. Errors can be features.
The 88% “error rate” was the system correctly rejecting load it couldn’t handle safely. Better to reject requests than crash the entire service.
3. Configuration is control.
Worker pools require tuning. That’s not a weakness—it’s a feature. You get to decide your system’s limits instead of discovering them during an outage.
4. Test under realistic load.
At low load, the naive pattern looked perfect. At high load, the problems became obvious. Always benchmark at the scale you expect in production.
5. Healthcare systems need predictability.
Fast response times with unpredictable spikes are worse than consistent, controlled behavior. Patient care depends on reliability.
6. sync.Pool isn’t magic.
My 51% hit rate shows the pool was working but not optimal. In sustained production load with proper warmup, you’d see 80-90% hit rates. But it’s not worth the complexity unless you’re handling high throughput.
7. The Go scheduler is amazing... to a point.
Go can handle thousands of goroutines efficiently. But “can handle” doesn’t mean “should use.” Bounded concurrency is still the right pattern for production systems.
Try It Yourself
All the code, benchmarks, and documentation are open source:
Repository: github.com/Stella-Achar-Oiro/healthcare-api-benchmark
Run the benchmarks on your machine:
git clone https://github.com/Stella-Achar-Oiro/healthcare-api-benchmark.git
cd healthcare-api-benchmark
go run cmd/loadtest/main.go -requests=10000 -concurrency=1000
Try different configurations:
# Test with more workers
go run cmd/loadtest/main.go -workers=100 -queue-size=500
# Test with different load patterns
go run cmd/loadtest/main.go -requests=50000 -concurrency=500
Final Thoughts
Building this benchmark taught me that engineering isn’t about finding the “best” solution. It’s about understanding tradeoffs.
The naive pattern isn’t wrong because it’s slow—it’s wrong because it’s unpredictable. The worker pool pattern isn’t better because it’s faster—it’s better because you control what happens under load.
In healthcare technology, and in production systems generally, predictable failure beats unpredictable catastrophe every single time.
Choose your concurrency pattern based on what you need your system to do when things go wrong. Because in production, things always go wrong.
What concurrency patterns do you use? Have you been burned by unlimited goroutine spawning? Drop a comment below—I’d love to hear your war stories.
If you found this useful, follow me for more Go, Kubernetes, and healthcare tech content. I’m building in public and sharing what I learn.
Stella Achar Oiro is a software developer combining clinical healthcare experience with cloud-native development. She builds healthcare technology with Go, Kubernetes, and AWS, and writes about the intersection of medicine and code.