ErdosBench Leaderboard (opens in new tab)

Research-math reasoning, judged beyond raw solved counts. ErdosBench evaluates how models behave on research-level, Erdős-inspired mathematical problems: finding decisive obstructions, using known theorems correctly, producing scoped partial progress, and avoiding unsafe solved claims. The leaderboard below is based on the full 226-problem run with external judge grades and model-specific proof audits. It is not a formal theorem-certification board; it is a correctness-first signal of which s...

Read the original article