Building a 60,000 RPS Time-Series Data Ingestion Pipeline in Go

Time-series data is everywhere in modern applications—from monitoring CPU usage and API latencies to tracking business metrics and IoT sensor readings. But handling this data at scale requires careful engineering. In this post, I'll walk you through building a high-performance time-series data ingestion pipeline that can handle over 60,000 requests per second with sub-millisecond latency.

The Challenge

Modern applications generate massive amounts of time-series data. Whether you're monitoring microservices, tracking user behavior, or collecting IoT sensor data, you need a system that can:

Accept thousands of metrics per second
Maintain low latency under high load
Efficiently batch writes to reduce database pressure

The Challenge

Modern applications generate massive amounts of time-series data. Whether you're monitoring microservices, tracking user behavior, or collecting IoT sensor data, you need a system that can:

Accept thousands of metrics per second
Maintain low latency under high load
Efficiently batch writes to reduce database pressure
Handle memory management without creating garbage collection pressure
Gracefully handle errors and shutdowns

Architecture Overview

Our solution is a Go-based HTTP server that sits between metric producers and InfluxDB. Here's how it works:

Client Apps → HTTP API → Worker Pool → Batch Processing → InfluxDB

The pipeline accepts JSON payloads containing time-series metrics and processes them asynchronously using a worker pool pattern. Each metric contains:

Measurement: The metric name (e.g., "cpu_usage", "api_latency")
Tags: Indexed metadata for filtering and grouping
Fields: The actual numeric values
Timestamp: When the metric was recorded

Key Performance Optimizations

1. Object Pooling for Memory Management

Instead of creating new objects for every request, we use sync.Pool to reuse metric and response objects:

var metricPool = sync.Pool{ New: func() any { return &TimeSeriesMetric{ Tags: make(map[string]string), Fields: make(map[string]interface{}), } }, }

This dramatically reduces garbage collection pressure, which is crucial for maintaining consistent low latency.

2. Worker Pool with Batch Processing

Rather than writing each metric individually to InfluxDB, we use a worker pool that batches metrics:

Multiple workers process metrics concurrently
Configurable batch size (1000 metrics per batch)
Time-based flushing (5ms timeout) ensures data freshness
Buffered channels prevent blocking on metric ingestion

3. Optimized HTTP Server Configuration

The HTTP server is tuned for high throughput:

server := &http.Server{ ReadTimeout: 5 * time.Second, WriteTimeout: 10 * time.Second, IdleTimeout: 60 * time.Second, ReadHeaderTimeout: 2 * time.Second, // Disable HTTP/2 for maximum performance TLSNextProto: make(map[string]func(*http.Server, *tls.Conn, http.Handler)), }

4. Non-blocking Channel Operations

The pipeline uses a large buffered channel (100,000 capacity) and non-blocking sends to prevent request handlers from waiting:

select { case metricsChan <- m: // Success default: // Channel full - return error immediately http.Error(w, "server busy", http.StatusServiceUnavailable) return }

Benchmark Results

Using the hey load testing tool with 50 concurrent connections over 10 seconds:

hey -z 10s -c 50 -m POST -T "application/json" \ -d '{"measurement": "api_requests", ...}' \ http://localhost:8080/metrics

Results:

Throughput: 60,776 requests/second
Average Latency: 0.8ms
99th Percentile: 2.2ms
Total Requests: 607,807 in 10 seconds
Zero Errors: All requests returned HTTP 200

What Makes This Fast?

Memory Efficiency

Object pooling eliminates allocation overhead, reducing GC pressure that could cause latency spikes.

Async Processing

The HTTP handler immediately queues metrics and returns, while workers process them in the background.

Batch Writes

Writing 1000 metrics at once is far more efficient than 1000 individual writes to InfluxDB.

Optimized Serialization

Direct JSON encoding/decoding without intermediate string allocation.

Smart Timeouts

Short timeouts prevent resource exhaustion while ensuring responsive error handling.

Real-World Considerations

Monitoring and Observability

In production, you'd want to add:

Metrics on queue depth and processing latency
Error rate monitoring
Resource utilization tracking
Custom dashboards for operational visibility

Horizontal Scaling

This design scales well horizontally:

Multiple instances can run behind a load balancer
Each instance maintains its own connection to InfluxDB
No shared state between instances

Data Durability

For mission-critical data, consider:

Persistent queuing (Redis, Kafka) for durability
Multiple InfluxDB replicas
Backup and recovery procedures

Lessons Learned

Memory management matters: Object pooling reduced our GC overhead by 80%, directly improving tail latency.

Batching is essential: Individual writes would have maxed out at ~1,000 RPS. Batching increased this 60x.

Channel sizing is critical: Too small causes blocking; too large uses excessive memory. 100k was our sweet spot.

HTTP/2 can hurt: For high-throughput APIs, HTTP/1.1 often performs better due to lower overhead.

Conclusion

Building high-performance data ingestion requires careful attention to memory management, concurrency patterns, and I/O optimization. This pipeline demonstrates that with the right architecture, Go can easily handle enterprise-scale time-series workloads.

The complete implementation handles graceful shutdowns, health checks, and configurable parameters while maintaining excellent performance characteristics. Whether you're building monitoring infrastructure, IoT data collection, or real-time analytics, these patterns provide a solid foundation for high-throughput data ingestion.

Find the code at: https://github.com/tanmaysharma2001/time-series-ingestion

The complete source code for this pipeline is production-ready and includes comprehensive error handling, logging, and configuration management. Consider this architecture when you need to process thousands of time-series data points per second with consistent low latency.

The Challenge

The Challenge

Architecture Overview

Key Performance Optimizations

1. Object Pooling for Memory Management

2. Worker Pool with Batch Processing

3. Optimized HTTP Server Configuration

4. Non-blocking Channel Operations

Benchmark Results

What Makes This Fast?

Memory Efficiency

Async Processing

Batch Writes

Optimized Serialization

Smart Timeouts

Real-World Considerations

Monitoring and Observability

Horizontal Scaling

Data Durability

Lessons Learned

Conclusion

Similar Posts