VibeThinker: A 3B-Parameter Model Just Beat Opus 4.5 on Reasoning — Here is How (opens in new tab)

Discussed on DEV

VibeThinker: A 3B-Parameter Model Just Beat Opus 4.5 on Reasoning — Here's How A team of researchers has quietly dropped one of the most surprising AI papers of the month. VibeThinker, a model with only 3 billion parameters, reportedly outperforms Anthropic's Opus 4.5 on key reasoning benchmarks — and the secret sauce is a novel training recipe combining Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO). For years, the dominant narrative has been that bigger is bette...

Read the original article