AI-generated image.
In 2024, an AI entered the fray of the International Mathematical Olympiad (IMO). Google’s AlphaProof is part of the same Alpha group that also created AlphaFold and AlphaGo. It solved problems that required a level of creativity and abstract reasoning previously thought to be uniquely human.
Now, in a landmark paper in Nature, researchers detailed the technology behind this achievement.
##…
AI-generated image.
In 2024, an AI entered the fray of the International Mathematical Olympiad (IMO). Google’s AlphaProof is part of the same Alpha group that also created AlphaFold and AlphaGo. It solved problems that required a level of creativity and abstract reasoning previously thought to be uniquely human.
Now, in a landmark paper in Nature, researchers detailed the technology behind this achievement.
How to Teach a Machine to Think
In the past two decades, computers have gradually toppled human champions in games of perfect information. It mastered checkers, then chess (Deep Blue), and finally, the profoundly complex game of Go (AlphaGo). But mathematics is a different beast entirely. It’s not a single game where you have all the information. It’s more like an infinite universe of games where you can do anything as long as you respect some rules.
Solving a complex math problem rarely involves brute force or intense calculation. It’s more about constructing a rigorous and logical argument. You often need to find a clever trick (or several) to even approach the problem. AlphaProof was trained to use the same specialized tools that modern mathematicians use: “proof assistants.” Think of a proof assistant (in this case, one called Lean) as a combination of a word processor, a logical rulebook, and an infallible referee.
AlphaProof’s remarkable achievement on the 2024 IMO problems. Image credits: DeepMind.
As Talia Ringer, a computer scientist at the University of Illinois at Urbana-Champaign, explains in a related News & Views article, using Lean is “a lot like playing a game.” A mathematician states a theorem (the goal) and then tries to prove it step-by-step. At each step, the mathematician applies a “tactic,” which is like a high-level strategy or a logical move. The Lean assistant checks the move. If it’s valid, the proof state updates, maybe creating new, simpler sub-goals. “Winning” the game means reaching a state where no goals are left — the theorem is proven, verified by the computer’s kernel with perfect, logical soundness.
The training happened in three stages:
- Pre-training: First, the model was fed a gigantic diet of code and mathematical text (300 billion tokens) to learn the basic language of logic, programming, and math.
- Fine-tuning: Next, it was given 300,000 examples of human-written proofs from Lean’s massive ‘Mathlib’ library. This taught it the specific rules and common tactics of the Lean “game.”
- The Main Loop (RL): This is where it learns to win on its own. Like AlphaZero playing Go against itself, AlphaProof was set loose to solve problems through reinforcement learning (RL), getting a reward for each successful proof.
The last part was the most challenging. In Go or chess, the board is always the same. As Ringer notes, “in the game of mathematical proof, the board is defined by the theorem that is being proved.”
So, the team built an AI to help its AI.
Training AlphaProof. Image credits: DeepMind.
AI Together Strong
They developed an “auto-formalization” system using a Gemini-based language model. They fed this system 1 million math problems written in natural language and had it translate them into 80 million formal, solvable “boards” for the Lean environment. This vast, synthetic curriculum was the training ground AlphaProof needed to become a general problem-solver.
This proved to be a successful approach.
On standard high-school and university-level math benchmarks, it blew previous state-of-the-art results out of the water. But the IMO’s hardest problems are a different beast. They often require a unique insight that a generalist AI, no matter how well-trained, might never stumble upon.
For these, AlphaProof had another secret weapon: Test-Time Reinforced Learning (TTRL).
TTRL is an ingenious, if computationally terrifying, solution. When AlphaProof gets truly stuck on a target problem, it doesn’t just “think harder” by searching more. Instead, it effectively goes back to school to earn a PhD in that one single problem.
First, it uses its language model to generate a new, bespoke curriculum of millions of “problem variants” centered around the hard problem. These are simplifications, generalizations, or explorations of related concepts. Then, it trains a new, specialist AlphaProof agent from scratch, using its full reinforcement learning algorithm, just on that narrow curriculum. This allows the AI to dedicate massive resources to deeply adapt to the specific “local” structure of a single, difficult problem.
This approach works, but it’s wildly computationally intensive. It’s not how humans reason at all. Solving each of the IMO problems this way required two to three days of TTRL, using hundreds of TPU-days of compute per problem.
Can AI Actually Make Math Discoveries?
Solving IMO problems is very impressive. But it’s not useful in and of itself.
Talia Ringer and her colleagues were given access to AlphaProof to find out. Their verdict, detailed in the News & Views article, is that the tool can definitely be useful in science.
“My experiences ultimately convinced me that it is useful beyond competition mathematics. Two PhD students affiliated with my laboratory each sent me one lemma — a small theorem that is used in the proof of a larger one — that they found tricky. AlphaProof proved one of the lemmas in under a minute, even though the student and their collaborator had been stuck on it for some time. It then disproved the other one, exposing a bug in a definition that the student had written, which they then fixed.”
Even more surprisingly, Ringer found that its most powerful feature wasn’t finding proofs but finding flaws. For the second lemma she gave it, AlphaProof quickly disproving it exposed a bug in the definition.
Mathematician Alex Kontorovich, who also tested the system, had a similar experience. He found it essential for refining his own ideas. “This back-and-forth between proposing statements and checking whether they can be established (or refuted) was essential to getting the formalization right,” he reported.
An Assistant, Not a Replacement
This doesn’t mean the end of mathematics or conceding our research positions to machines.
Kevin Buzzard, a mathematician at Imperial College London, is famously working to formalize the proof of Fermat’s Last Theorem in Lean. For him, the tool was useless because the definitions required are not formalized in the library used to train the AI.
It also can’t invent new fields of math (at least not yet), and it’s not about to replace the creative spark of human intuition. But it is poised to become an indispensable collaborator — a tireless, logical partner that can check our work, find our mistakes, and handle the million-step logical leaps we can’t.
No doubt, this will have an impact. It remains to be seen how big this impact can be.
“It is clear to me that something is changing in this field; perhaps AlphaProof is a harbinger of what is to come.”
The study was published in Nature.