Writing an LLM from scratch, part 30 -- digging into the LLM-as-a-judge results (opens in new tab)

Discussed on Hacker News

I was unhappy with the LLM-as-a-judge instruction fine-tuning results I got when comparing my various base models. Could I make them any better?