How to Rank Local LLMs by Cost per Correct Answer (Measured GPU Energy, 8 Ollama Models) (opens in new tab)

Discussed on DEV

TL;DR: I priced 8 local Ollama models by € per 1,000 correct answers — metered GPU energy ÷ correct answers, on one RTX 3090. gemma4:26b won at 96.9% accuracy for €0.013/1k-correct. The most expensive model (qwen3:8b-fp16) cost €0.239/1k and scored worse (66.7%). Reasoning tokens and full precision both cost a lot and bought nothing here. Every cost comes from real metered kWh via the open-source HomeLab Monitor. This is the short, copy-pasteable version. The narrative writeup is on Medium. T...

Read the original article