Fine-Tuning Llama 3.2 3B on Medical QA: Week 4 - When Lower Loss Meant a Worse Model (opens in new tab)
What Happened This Week Week 3 produced a working fine-tuned model: one epoch, one dataset, a clear improvement over the base model. This week 4 was supposed to make it better with More data (a second dataset), two epochs, and a cleaner setup. The eval loss dropped from 2.495 to 2.275. By that number alone, Week 4 was going to be a success. The model was worse. This is the story of how a better loss number hid a serious regression, how I diagnosed it, and what it took to actually fix it. It i...
Read the original article