QLoRA: Fine-Tuning a 7B Model on a 16GB GPU (It Shrank to 5.4GB in Front of Me) (opens in new tab)
Series — Fine-Tuning, Smallest to Largest: QLoRA (7B) ← you are here In , LoRA let me fine-tune a 1.5B model by freezing it and training tiny adapters. But the frozen base still sat in memory in 16-bit (~3GB). Now I wanted to go to Qwen2.5-7B — and hit a wall that LoRA alone doesn't solve. The problem A 7B model is ~15GB in 16-bit precision. A free-tier T4 GPU has 16GB. It would barely load, with no room left to actually train. The QLoRA insight QLoRA asks the question that naturally follows ...
Read the original article