How Much Does It Actually Cost to Run a Local LLM? (€ per Million Tokens, Measured) (opens in new tab)

Discussed on DEV

"It runs on my own GPU, so it's basically free." I believed that until I put a meter on it. So I ran a controlled benchmark on one box — an openSUSE machine with a single RTX 3090 — driving three local models through ollama under an identical fixed workload (256-token generations in a loop for ~4 minutes each), while my open-source dashboard priced every run by the real GPU energy it burned: power sampled from nvidia-smi every 10 s, integrated over each run's exact window, multiplied by my ac...

Read the original article