Fixing a stuck Ollama runner and building a GPU watchdog (opens in new tab)
If you self-host LLMs, at some point, you’ll likely experience a stuck GPU consuming significant electricity for long periods of time. This is my summary of discovering the problem & then implementing automation that corrects the issue. I want automation to discover & correct these issues. I don’t want to be the first line of … Continue reading "Fixing a stuck Ollama runner and building a GPU watchdog"
Read the original article