My ML group has a few GPU servers. I wanted to check utilization without SSHing into each machine. The standard answer is Grafana + Prometheus + exporters, but that felt like overkill for checking if GPUs are busy.
I built GPU Hot as a simpler alternative. This post is about why that made sense.
The Grafana Problem
Grafana is excellent for production monitoring at scale. But for a small team with a few GPU boxes, you’re looking at:
- Installing Prometheus
- Installing node exporters on each server
- Installing GPU exporters
- Writing Prometheus configs
- Setting up Grafana dashboards
- Maintaining all of this
For this use case (checking GPU utilization while walking to get coffee), this was too much infrastructure.
What I Act…
My ML group has a few GPU servers. I wanted to check utilization without SSHing into each machine. The standard answer is Grafana + Prometheus + exporters, but that felt like overkill for checking if GPUs are busy.
I built GPU Hot as a simpler alternative. This post is about why that made sense.
The Grafana Problem
Grafana is excellent for production monitoring at scale. But for a small team with a few GPU boxes, you’re looking at:
- Installing Prometheus
- Installing node exporters on each server
- Installing GPU exporters
- Writing Prometheus configs
- Setting up Grafana dashboards
- Maintaining all of this
For this use case (checking GPU utilization while walking to get coffee), this was too much infrastructure.
What I Actually Needed
A web page that shows: which GPUs are in use, temperature, memory usage, and what processes are running. Updates in real-time so I can see when a training job finishes.
That’s it. No alerting, no long-term storage, no complex queries.
The Setup
One Docker command per server:
docker run -d --gpus all -p 1312:1312 ghcr.io/psalias2006/gpu-hot:latest
Open http://localhost:1312 and you see your GPUs updating every 0.5 seconds.
For multiple servers, run the container on each GPU box, then start a hub:
# On each GPU server
docker run -d --gpus all -p 1312:1312 \
-e NODE_NAME=$(hostname) \
ghcr.io/psalias2006/gpu-hot:latest
# On your laptop (no GPU needed)
docker run -d -p 1312:1312 \
-e GPU_HOT_MODE=hub \
-e NODE_URLS=http://server1:1312,http://server2:1312 \
ghcr.io/psalias2006/gpu-hot:latest
Open http://localhost:1312 and you see all GPUs from all servers in one dashboard. Total setup time: under 5 minutes.
How It Works
The core is straightforward:
NVML for metrics: Python’s NVML bindings give direct access to GPU data. Faster than parsing nvidia-smi output and returns structured data.
FastAPI + WebSockets: Async WebSockets push metrics to the browser. No polling, sub-second updates. The server collects metrics and broadcasts them to all connected clients.
Hub mode: Each node runs the same container and exposes metrics via WebSocket. The hub connects to all nodes, aggregates their data, and serves it through a single dashboard.
Frontend: Vanilla JavaScript with Chart.js. No build step, no framework, just HTML/CSS/JS.
Docker: Packages everything. Users don’t need to install Python, NVML bindings, or manage dependencies. The NVIDIA Container Toolkit handles GPU access.
When This Approach Works
This pattern works well when:
- You have a small number of machines (1-20)
- You need real-time visibility, not historical analysis
- Your team is small enough that everyone can check one dashboard
- You don’t need alerting or complex queries
It doesn’t replace proper monitoring for production services. But for development infrastructure in a small team, it’s sufficient and much simpler to maintain.
Trade-offs
What you lose compared to Grafana:
- No persistent storage (metrics are only kept in memory for the current session)
- No alerting
- No complex queries or correlations
- No authentication (we run this on an internal network)
What you gain:
- Zero configuration
- Sub-second updates
- No maintenance
- One command deployment
For this use case, the trade-off made sense. This isn’t for monitoring production services. It’s for checking if GPUs are free before starting a training run.
Takeaway
Not every monitoring problem needs the full observability stack. For small teams with straightforward needs, a purpose-built tool can be simpler to deploy and maintain than configuring enterprise solutions.
Try the interactive demo to see it in action.