Sakana Trained One AI to Command GPT-5.5, (opens in new tab)
Two days ago a Tokyo lab shipped a model that scored 73.7 on SWE-Bench Pro. Opus 4.8 gets 69.2 on the same test. GPT-5.5 gets 58.6. Gemini…
Read the original articleTwo days ago a Tokyo lab shipped a model that scored 73.7 on SWE-Bench Pro. Opus 4.8 gets 69.2 on the same test. GPT-5.5 gets 58.6. Gemini…
Read the original article