Introducing BenchBench (opens in new tab)

Covered by tldr.techDiscussed on Hacker News

TL;DR: presenting the ultimate benchmark, getting models to create benchmarks for each other, and GPT 5.2 is the current (only) winner

Sign in to keep reading the full article.

Covered in 1 article