How the community trained Gemma to "Think" with Tunix and TPUs (opens in new tab)

Covers On-Policy [LLM] Distillation (2025)Discussed on Hacker News

The Google Tunix Hackathon on Kaggle challenged developers to transform small, non-reasoning base models into general reasoning engines using Kaggle TPUs and a limited compute budget. The winning teams achieved this by implementing multi-stage post-training pipelines that combined Supervised Fine-Tuning (SFT) with advanced alignment techniques like GRPO and SimPO. Ultimately, the competition democratized AI development by proving that highly capable, structured reasoning models can be success...

Read the original article