I Stopped Paying for Idle GPUs - Scale-to-Zero AI Inference on OKE with KEDA (opens in new tab)

Covers pmady/keda-gpu-scaler: KEDA External gRPC Scaler for GPU workloads — native NVML metrics via DaemonSet, no Prometheus requiredDiscussed on DEV

A single A10 GPU on OCI costs $1.52/hr. Running 24/7, that's $1,094/month. For a production inference service with steady traffic, that's fine. But I had a staging environment and a couple of internal tools that got maybe 20 requests per day. I was paying over $2,000/month for GPUs that sat idle 95% of the time. The obvious solution: scale to zero when there's no traffic, spin up when a request comes in. KEDA does this on Kubernetes, but getting it to work properly with GPU pods took some fig...

Read the original article