I got tired of Ollama hogging my VRAM when I needed to multitask, so I wrote a lightweight VRAM Guard.
github.com·16h·
Discuss: r/LocalLLaMA
Flag this post

🛡️ Neral VRAM Guard

Neral VRAM Guard is a lightweight, standalone Python utility designed to monitor NVIDIA GPU memory usage and automatically manage Ollama models.

When your VRAM usage exceeds a safe threshold, the script forces Ollama to unload models immediately—preventing system freezes, OOM (Out of Memory) crashes, and performance drops during heavy AI workflows.


🚀 Features

  • Real-Time Monitoring – Checks VRAM usage using nvidia-smi at regular intervals.
  • Auto-Unload – Automatically detects loaded Ollama models and unloads them when memory is high.
  • Panic Button – Use --clear-now to instantly free VRAM.
  • Dry Run Mode – Simulate cleanup logic without actually unloading models.
  • Lightweight – Minimal dependencies (only requires `aio…

Similar Posts

Loading similar posts...