Background
This project started as an internal tool to understand unusual volume and intraday activity across thousands of US stocks.
I recently built a real-time stock scanner that ingests thousands of symbols, processes live updates, and serves them to users with minimal delay. This post is a short breakdown of the architecture decisions that worked (and the ones that didn’t).
The Core Problem
Market data is:
- high-frequency
- bursty
- time-sensitive
Writing every update directly to a relational database quickly becomes expensive and unnecessary — especially when most of the data is only relevant for seconds or minutes.
Architecture Overview
The system is split into three layers:
- Ingestion — pulling snapshot and streaming data from external APIs -…
Background
This project started as an internal tool to understand unusual volume and intraday activity across thousands of US stocks.
I recently built a real-time stock scanner that ingests thousands of symbols, processes live updates, and serves them to users with minimal delay. This post is a short breakdown of the architecture decisions that worked (and the ones that didn’t).
The Core Problem
Market data is:
- high-frequency
- bursty
- time-sensitive
Writing every update directly to a relational database quickly becomes expensive and unnecessary — especially when most of the data is only relevant for seconds or minutes.
Architecture Overview
The system is split into three layers:
- Ingestion — pulling snapshot and streaming data from external APIs
- Processing — normalizing, filtering, and calculating derived metrics
- Distribution — pushing live updates to the frontend
At a high level:
- Redis handles hot, ephemeral data
- Postgres stores validated, historical data
- WebSockets deliver updates to clients in real time
Why Redis for Live Data
Redis turned out to be a much better fit than Postgres for intraday updates:
- In-memory speed
- Natural TTL support for expiring symbols
- Pub/Sub for fan-out
- Zero WAL / IO pressure
Postgres is still excellent — just not for data that changes every few seconds.
WebSockets for Distribution
Instead of polling:
- active symbols are tracked dynamically
- only “in-play” tickers receive updates
- clients subscribe to exactly what they need
This keeps both bandwidth and server load under control.
What I’d Do Differently Next Time
- Separate “hot” and “cold” data earlier
- Treat intraday data as disposable by default
- Avoid over-modeling early — schemas can come later
Live Demo
It’s still evolving, but the core real-time pipeline is already doing the heavy lifting.
Closing Thoughts
Real-time systems are less about frameworks and more about respecting the nature of your data.
If your data expires quickly, your architecture should too. Happy to answer questions or discuss real-time data tradeoffs.