A step-by-step guide to identifying and fixing connection pool bottlenecks in Node.js applications
How I reduced API response time from 1,221ms to 280ms using systematic load testing and pool optimization
What You’ll Learn
In this guide, you’ll discover how to:
- Identify connection pool bottlenecks in your Node.js application
- Calculate the optimal pool size using load testing with k6
- Monitor pool health with simple metrics
- Fix performance issues that cause 1000ms+ response times
Let’s start with a real production issue I encountered and how I solved it systematically.
The Problem: Slow Response Times with Low CPU Usage
During a routine load test of my Node.js API with 150 concurrent users, I observed these concerning metrics:
Node.js CPU: 45%
R...
A step-by-step guide to identifying and fixing connection pool bottlenecks in Node.js applications
How I reduced API response time from 1,221ms to 280ms using systematic load testing and pool optimization
What You’ll Learn
In this guide, you’ll discover how to:
- Identify connection pool bottlenecks in your Node.js application
- Calculate the optimal pool size using load testing with k6
- Monitor pool health with simple metrics
- Fix performance issues that cause 1000ms+ response times
Let’s start with a real production issue I encountered and how I solved it systematically.
The Problem: Slow Response Times with Low CPU Usage
During a routine load test of my Node.js API with 150 concurrent users, I observed these concerning metrics:
Node.js CPU: 45%
RAM: 60%
PostgreSQL CPU: 15%
Average Response Time: 1,221ms
P95 Latency: 2,890ms
The symptom was clear: Database CPU at 15% while response times exceeded 1 second indicated a bottleneck between the application and database layers, not a database performance issue.
Step 1: Add Connection Pool Monitoring
First, I added monitoring to understand what’s happening inside the connection pool. This single line of code revealed everything:
// Monitor pool health every 5 seconds
setInterval(() => {
console.log(
`[POOL] Total: ${pool.totalCount} | Idle: ${pool.idleCount} | Waiting: ${pool.waitingCount}`
);
}, 5000);
Key Metrics Explained:
- Total: Number of connections in the pool
- Idle: Available connections ready to use
- Waiting: Requests queued because all connections are busy
Basic Server Setup with Monitoring
import dotenv from "dotenv";
import express from "express";
import { Pool } from "pg";
dotenv.config();
const app = express();
const PORT = process.env.PORT || 3000;
const poolConfig = {
host: process.env.DB_HOST,
port: process.env.DB_PORT,
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
// These are what we're testing
min: parseInt(process.env.POOL_MIN) || 2,
max: parseInt(process.env.POOL_MAX) || 10,
// Important timeouts
idleTimeoutMillis: parseInt(process.env.POOL_IDLE_TIMEOUT) || 30000,
connectionTimeoutMillis: parseInt(process.env.POOL_CONNECTION_TIMEOUT) || 2000,
};
const pool = new Pool(poolConfig);
const WORKER_ID = process.pid;
console.log("=================================");
console.log(`[Worker-${WORKER_ID}]`);
console.log(` Host: ${poolConfig.host}`);
console.log(` Database: ${poolConfig.database}`);
console.log(` Pool Min: ${poolConfig.min}`);
console.log(` Pool Max: ${poolConfig.max}`);
console.log("=================================\n");
// Test database connection
pool.query("SELECT NOW()", (err, res) => {
if (err) {
console.error(`[Worker-${WORKER_ID}] Database connection failed:`, err.message);
process.exit(1);
}
console.log(`[Worker-${WORKER_ID}] Database connected at:`, res.rows[0].now);
});
// API endpoint for testing
app.get("/users", async (req, res) => {
const startTime = Date.now();
try {
const result = await pool.query("SELECT * FROM users LIMIT 100");
const duration = Date.now() - startTime;
res.json({
success: true,
count: result.rows.length,
data: result.rows,
query_time_ms: duration,
});
} catch (err) {
const duration = Date.now() - startTime;
res.status(500).json({
success: false,
error: err.message,
query_time_ms: duration,
});
}
});
// The critical monitoring - logs every 5 seconds
setInterval(() => {
console.log(
`[Worker-${WORKER_ID}] [POOL] Total: ${pool.totalCount} | Idle: ${pool.idleCount} | Waiting: ${pool.waitingCount}`
);
}, 5000);
// Graceful shutdown
process.on("SIGTERM", async () => {
console.log("\nSIGTERM received, closing pool...");
await pool.end();
process.exit(0);
});
process.on("SIGINT", async () => {
console.log("\nSIGINT received, closing pool...");
await pool.end();
process.exit(0);
});
app.listen(PORT, () => {
console.log(`\n[Worker-${WORKER_ID}] Server running on http://localhost:${PORT}`);
console.log(`[Worker-${WORKER_ID}] Users endpoint: http://localhost:${PORT}/users\n`);
});
Step 2: Create k6 Load Test
Create a realistic load test that simulates production traffic:
// load-test.js
import http from "k6/http";
import { check, sleep } from "k6";
export const options = {
stages: [
{ duration: "10s", target: 10 }, // Warm up
{ duration: "20s", target: 30 }, // Ramp up
{ duration: "30s", target: 60 }, // Increase load
{ duration: "10s", target: 100 }, // Push harder
{ duration: "15s", target: 150 }, // Peak traffic
{ duration: "10s", target: 0 }, // Cool down
],
thresholds: {
http_req_duration: ["p(95)<2000"], // 95% of requests under 2s
http_req_failed: ["rate<0.1"], // Less than 10% errors
},
};
export default function () {
const res = http.get("http://localhost:3000/users");
check(res, {
"status is 200": (r) => r.status === 200,
"response time < 500ms": (r) => r.timings.duration < 500,
});
sleep(0.1); // Each user waits 100ms between requests
}
export function handleSummary(data) {
const metrics = data.metrics;
console.log("\n========================================");
console.log("LOAD TEST RESULTS");
console.log("========================================");
console.log(`Total Requests: ${metrics.http_reqs.values.count}`);
console.log(`Average Response: ${metrics.http_req_duration.values.avg.toFixed(2)}ms`);
console.log(`P95 Latency: ${metrics.http_req_duration.values["p(95)"].toFixed(2)}ms`);
console.log(`Failed Requests: ${(metrics.http_req_failed.values.rate * 100).toFixed(2)}%`);
console.log("========================================\n");
return { stdout: "" };
}
Step 3: Test Different Pool Sizes Systematically
I tested each pool size by editing the .env file and restarting the server:
.env Configuration
# Database configuration
DB_HOST=localhost
DB_PORT=5432
DB_NAME=testdb
DB_USER=postgres
DB_PASSWORD=password
# Pool configuration - Edit POOL_MAX for each test
POOL_MIN=1
POOL_MAX=10 # Change to: 10, 15, 20, 40
POOL_IDLE_TIMEOUT=30000
POOL_CONNECTION_TIMEOUT=10000
Testing Workflow
For each pool size test:
- Edit
.envfile - ChangePOOL_MAXvalue - Start the server -
npm run dev - Run load test -
k6 run load-tests/test.js(in another terminal) - Collect results - Watch the console logs and k6 output
- Stop server - Ctrl+C and repeat with next pool size
# Step 1: Set POOL_MAX=10 in .env file
# Step 2: Start server
npm run dev
# Step 3: In another terminal, run k6
k6 run load-tests/test.js
# Step 4: Note the results
# Step 5: Stop server (Ctrl+C)
# Repeat: Edit .env to POOL_MAX=15
npm run dev
k6 run load-tests/test.js
# Repeat: Edit .env to POOL_MAX=20
npm run dev
k6 run load-tests/test.js
# Repeat: Edit .env to POOL_MAX=40
npm run dev
k6 run load-tests/test.js
Step 4: Analyse Test Results
Test #1: Pool Size 10 (Baseline)
Server Logs During Load:
[Worker-24064] [POOL] Total: 1 | Idle: 0 | Waiting: 0
[Worker-24064] [POOL] Total: 6 | Idle: 1 | Waiting: 0
[Worker-24064] [POOL] Total: 10 | Idle: 3 | Waiting: 0
[Worker-24064] [POOL] Total: 10 | Idle: 1 | Waiting: 0
[Worker-24064] [POOL] Total: 10 | Idle: 0 | Waiting: 11
[Worker-24064] [POOL] Total: 10 | Idle: 0 | Waiting: 14
[Worker-24064] [POOL] Total: 10 | Idle: 0 | Waiting: 22
[Worker-24064] [POOL] Total: 10 | Idle: 0 | Waiting: 24
[Worker-24064] [POOL] Total: 10 | Idle: 0 | Waiting: 36
[Worker-24064] [POOL] Total: 10 | Idle: 0 | Waiting: 37
[Worker-24064] [POOL] Total: 10 | Idle: 0 | Waiting: 58
[Worker-24064] [POOL] Total: 10 | Idle: 0 | Waiting: 53
[Worker-24064] [POOL] Total: 10 | Idle: 0 | Waiting: 69
[Worker-24064] [POOL] Total: 10 | Idle: 0 | Waiting: 72
[Worker-24064] [POOL] Total: 10 | Idle: 0 | Waiting: 105
[Worker-24064] [POOL] Total: 10 | Idle: 0 | Waiting: 123
[Worker-24064] [POOL] Total: 10 | Idle: 0 | Waiting: 130 PEAK QUEUE
[Worker-24064] [POOL] Total: 10 | Idle: 0 | Waiting: 54
[Worker-24064] [POOL] Total: 10 | Idle: 6 | Waiting: 0
[Worker-24064] [POOL] Total: 10 | Idle: 10 | Waiting: 0
[Worker-24064] [POOL] Total: 0 | Idle: 0 | Waiting: 0
k6 Results:
Test #2: Pool Size 15
Server Logs:
[Worker-29532] [POOL] Total: 1 | Idle: 1 | Waiting: 0
[Worker-29532] [POOL] Total: 5 | Idle: 1 | Waiting: 0
[Worker-29532] [POOL] Total: 8 | Idle: 3 | Waiting: 0
[Worker-29532] [POOL] Total: 11 | Idle: 1 | Waiting: 0
[Worker-29532] [POOL] Total: 15 | Idle: 4 | Waiting: 0
[Worker-29532] [POOL] Total: 15 | Idle: 0 | Waiting: 0
[Worker-29532] [POOL] Total: 15 | Idle: 0 | Waiting: 3
[Worker-29532] [POOL] Total: 15 | Idle: 0 | Waiting: 11
[Worker-29532] [POOL] Total: 15 | Idle: 0 | Waiting: 16
[Worker-29532] [POOL] Total: 15 | Idle: 0 | Waiting: 20
[Worker-29532] [POOL] Total: 15 | Idle: 0 | Waiting: 24
[Worker-29532] [POOL] Total: 15 | Idle: 0 | Waiting: 30
[Worker-29532] [POOL] Total: 15 | Idle: 0 | Waiting: 34
[Worker-29532] [POOL] Total: 15 | Idle: 0 | Waiting: 55
[Worker-29532] [POOL] Total: 15 | Idle: 0 | Waiting: 72
[Worker-29532] [POOL] Total: 15 | Idle: 0 | Waiting: 89
[Worker-29532] [POOL] Total: 15 | Idle: 0 | Waiting: 104
[Worker-29532] [POOL] Total: 15 | Idle: 0 | Waiting: 120 PEAK QUEUE
[Worker-29532] [POOL] Total: 15 | Idle: 0 | Waiting: 81
[Worker-29532] [POOL] Total: 15 | Idle: 0 | Waiting: 44
[Worker-29532] [POOL] Total: 15 | Idle: 15 | Waiting: 0
[Worker-29532] [POOL] Total: 14 | Idle: 14 | Waiting: 0
[Worker-29532] [POOL] Total: 0 | Idle: 0 | Waiting: 0
k6 Results:
Test #3: Pool Size 20
Server Logs:
[Worker-31608] [POOL] Total: 1 | Idle: 1 | Waiting: 0
[Worker-31608] [POOL] Total: 4 | Idle: 2 | Waiting: 0
[Worker-31608] [POOL] Total: 8 | Idle: 4 | Waiting: 0
[Worker-31608] [POOL] Total: 14 | Idle: 2 | Waiting: 0
[Worker-31608] [POOL] Total: 18 | Idle: 1 | Waiting: 0
[Worker-31608] [POOL] Total: 20 | Idle: 8 | Waiting: 0
[Worker-31608] [POOL] Total: 20 | Idle: 3 | Waiting: 0
[Worker-31608] [POOL] Total: 20 | Idle: 0 | Waiting: 2
[Worker-31608] [POOL] Total: 20 | Idle: 0 | Waiting: 10
[Worker-31608] [POOL] Total: 20 | Idle: 0 | Waiting: 18
[Worker-31608] [POOL] Total: 20 | Idle: 0 | Waiting: 24
[Worker-31608] [POOL] Total: 20 | Idle: 0 | Waiting: 36
[Worker-31608] [POOL] Total: 20 | Idle: 0 | Waiting: 48
[Worker-31608] [POOL] Total: 20 | Idle: 0 | Waiting: 66
[Worker-31608] [POOL] Total: 20 | Idle: 0 | Waiting: 86
[Worker-31608] [POOL] Total: 20 | Idle: 0 | Waiting: 99
[Worker-31608] [POOL] Total: 20 | Idle: 0 | Waiting: 114 PEAK
[Worker-31608] [POOL] Total: 20 | Idle: 0 | Waiting: 60
[Worker-31608] [POOL] Total: 20 | Idle: 8 | Waiting: 0
[Worker-31608] [POOL] Total: 20 | Idle: 20 | Waiting: 0 RECOVERED
[Worker-31608] [POOL] Total: 8 | Idle: 8 | Waiting: 0
[Worker-31608] [POOL] Total: 0 | Idle: 0 | Waiting: 0
k6 Results:
Test #4: Pool Size 40
Server Logs:
[Worker-31644] [POOL] Total: 1 | Idle: 1 | Waiting: 0
[Worker-31644] [POOL] Total: 4 | Idle: 0 | Waiting: 0
[Worker-31644] [POOL] Total: 9 | Idle: 0 | Waiting: 0
[Worker-31644] [POOL] Total: 13 | Idle: 2 | Waiting: 0
[Worker-31644] [POOL] Total: 21 | Idle: 4 | Waiting: 0
[Worker-31644] [POOL] Total: 27 | Idle: 8 | Waiting: 0
[Worker-31644] [POOL] Total: 32 | Idle: 12 | Waiting: 0
[Worker-31644] [POOL] Total: 38 | Idle: 1 | Waiting: 0
[Worker-31644] [POOL] Total: 40 | Idle: 4 | Waiting: 0
[Worker-31644] [POOL] Total: 40 | Idle: 6 | Waiting: 0
[Worker-31644] [POOL] Total: 40 | Idle: 13 | Waiting: 0
[Worker-31644] [POOL] Total: 40 | Idle: 0 | Waiting: 0
[Worker-31644] [POOL] Total: 40 | Idle: 0 | Waiting: 12
[Worker-31644] [POOL] Total: 40 | Idle: 0 | Waiting: 36
[Worker-31644] [POOL] Total: 40 | Idle: 0 | Waiting: 51
[Worker-31644] [POOL] Total: 40 | Idle: 0 | Waiting: 65
[Worker-31644] [POOL] Total: 40 | Idle: 0 | Waiting: 88 PEAK (still queuing!)
[Worker-31644] [POOL] Total: 40 | Idle: 0 | Waiting: 19
[Worker-31644] [POOL] Total: 40 | Idle: 36 | Waiting: 0
[Worker-31644] [POOL] Total: 40 | Idle: 40 | Waiting: 0 RECOVERED
[Worker-31644] [POOL] Total: 7 | Idle: 7 | Waiting: 0
[Worker-31644] [POOL] Total: 0 | Idle: 0 | Waiting: 0
k6 Results:
Analysis:
- Best response times: 280ms average (77% improvement from baseline)
- Highest throughput: 13,680 requests (239% increase)
- Still experienced queuing: Peak of 88 waiting despite 40 connections
- Diminishing returns: Only 59% improvement over Pool 20
Step 5: Performance Analysis
Results Summary
| Pool Size | Total Requests | Avg Response | P95 Latency | Max Queue | Improvement |
|---|---|---|---|---|---|
| 10 | 4,040 | 1,221ms | 2,890ms | 130 | Baseline |
| 15 | 6,969 | 654ms | 1,545ms | 120 | +72.5% |
| 20 | 8,605 | 508ms | 1,171ms | 114 | +113% (Optimal) |
| 40 | 13,680 | 280ms | 529ms | 88 | +239% |
Key Insights
- 10 → 15: Significant improvement (72.5% throughput increase, 46% latency reduction)
- 15 → 20: Strong continued gains (23.5% more throughput, 22% latency reduction)
- 20 → 40: Diminishing returns (59% more throughput but 2x the connections)
VERDICT: Pool Size 20 is Optimal
While Pool Size 40 delivers the best raw performance, Pool Size 20 provides the best balance:
Pool 20 advantages:
- 113% throughput improvement with just 20 connections
- Clears queue completely (recovers to 0)
- Uses 50% fewer database resources than Pool 40
- Already achieves sub-600ms response times
Pool 40 considerations:
- Only 59% better than Pool 20 despite 100% more connections
- Still experiences queuing (88 peak) during high load
- Higher database resource consumption
- Better suited if you expect >200 concurrent users
Step 6: Calculate Your Optimal Pool Size
Based on the testing, here’s a formula to calculate your starting point:
function calculateOptimalPoolSize(concurrentUsers, avgQueryTimeMs) {
// Base calculation
const baseSize = (concurrentUsers * avgQueryTimeMs) / 1000;
// Add 20% safety buffer
const withBuffer = baseSize * 1.2;
return Math.ceil(withBuffer);
}
// Example for our case
const poolSize = calculateOptimalPoolSize(150, 150);
console.log(poolSize); // Returns 27
// Testing showed 20 was sufficient because actual query time was faster
Important Constraints
// Check database limits
const dbMaxConnections = 100; // from SHOW max_connections;
const reserveForAdmin = 10;
const availableForApp = dbMaxConnections - reserveForAdmin;
// If using multiple workers
const workers = 4; // PM2 cluster mode
const poolPerWorker = Math.floor(availableForApp / workers);
console.log(`Each worker gets ${poolPerWorker} connections`);
Step 7: Production Implementation
Enhanced Monitoring Setup
const ALERT_THRESHOLDS = {
waitingQueue: 20,
idlePercentage: 10,
};
setInterval(() => {
const total = pool.totalCount;
const idle = pool.idleCount;
const waiting = pool.waitingCount;
const idlePercentage = total > 0 ? (idle / total) * 100 : 0;
// Log metrics
console.log(`[POOL] Total: ${total} | Idle: ${idle} | Waiting: ${waiting}`);
// Alert conditions
if (waiting > ALERT_THRESHOLDS.waitingQueue) {
console.error(` [ALERT] High queue: ${waiting} requests waiting`);
// Send to monitoring service
}
if (idlePercentage < ALERT_THRESHOLDS.idlePercentage && total > 0) {
console.warn(` [ALERT] Low idle connections: ${idlePercentage.toFixed(1)}%`);
}
}, 5000);
Database Monitoring Queries
-- Check current connections
SELECT count(*) as total,
count(*) FILTER (WHERE state = 'active') as active,
count(*) FILTER (WHERE state = 'idle') as idle
FROM pg_stat_activity
WHERE datname = current_database();
-- Find connection hogs
SELECT pid, usename, application_name, state,
NOW() - query_start as duration,
substring(query, 1, 50) as query_preview
FROM pg_stat_activity
WHERE state != 'idle'
AND query_start < NOW() - INTERVAL '1 minute'
ORDER BY query_start;
Common Pitfalls to Avoid
1. Not Releasing Connections
// BAD: Connection leak
const client = await pool.connect();
const result = await client.query("SELECT * FROM users");
res.json(result.rows);
// Forgot to release!
// GOOD: Always release
const client = await pool.connect();
try {
const result = await client.query("SELECT * FROM users");
res.json(result.rows);
} finally {
client.release();
}
// BETTER: Use pool.query() for simple queries
const result = await pool.query("SELECT * FROM users");
res.json(result.rows);
2. Holding Connections During External Calls
// BAD: Blocks connection for 2+ seconds
const client = await pool.connect();
const user = await client.query("SELECT * FROM users WHERE id = $1", [id]);
const apiData = await fetch("https://slow-api.com/data"); // 2 seconds!
await client.query("UPDATE users SET data = $1", [apiData]);
client.release();
// GOOD: Release before external call
const user = await pool.query("SELECT * FROM users WHERE id = $1", [id]);
const apiData = await fetch("https://slow-api.com/data");
await pool.query("UPDATE users SET data = $1", [apiData]);
Quick Reference
Healthy Pool Indicators
- Waiting < 10 consistently
- Idle > 0 (at least 10% of total)
- Response time P95 < 1000ms
- No connection timeouts
When to Increase Pool Size
- Waiting > 30 consistently
- Idle always 0
- Response times > 1000ms
- Database CPU < 50% under load
Useful Commands
# Start server (uses .env file)
npm run dev
# Run load test (in another terminal)
k6 run load-tests/test.js
# PostgreSQL connections check
psql -c "SELECT count(*) FROM pg_stat_activity"
Conclusion
Connection pool optimization reduced our response time from 1,221ms to 280ms - a 77% improvement - through systematic testing with k6 load tests.
The Complete Performance Picture:
| Metric | Pool 10 | Pool 20 (Recommended) | Pool 40 |
|---|---|---|---|
| Avg Response | 1,221ms | 508ms (-58%) | 280ms (-77%) |
| P95 Latency | 2,890ms | 1,171ms (-59%) | 529ms (-82%) |
| Throughput | 4,040 req | 8,605 req (+113%) | 13,680 req (+239%) |
| Max Queue | 130 | 114 | 88 |
| Efficiency | Poor | Excellent | Good |
Final Verdict: Choose Pool Size 20
Why Pool 20 over Pool 40?
- Resource efficiency: 113% performance gain with just 20 connections
- Diminishing returns: Pool 40 only adds 59% more throughput for 100% more connections
- Cost-benefit: Pool 20 achieves the “good enough” threshold (<600ms response)
- Database overhead: Fewer connections = less memory and CPU on PostgreSQL
Key Takeaways:
- Default pool (10) is severely undersized - caused 1.2s response times
- Pool 20 hits the sweet spot - doubles throughput, halves latency
- Pool 40 shows diminishing returns - consider only for >200 concurrent users
- Queue monitoring is critical - even Pool 40 had 88 requests waiting at peak
Remember: The default pool size of 10 resulted in 130 requests queuing. Simply changing to Pool 20 delivered a 2x throughput improvement with sub-second response times.
Resources:
Questions? Drop them in the comments. What pool size works for your application?
💡 Found this helpful?
Drop a ❤️ and 🦄 if this helped you. Follow for more practical guides and real-world dev experiences.
🤝 Let’s connect: