An LLM verifier rated math proofs near-perfect; an expert found 17% correct (opens in new tab)

Covers MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time ScalingDiscussed on Hacker News

Two posts ago I quoted a warning: an AI will find it easier to convince you it has a proof than to write one. A middling new paper finally put a number on that gap — 0.99 against 0.55.

Read the original article