An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run (opens in new tab)
Epoch AI's new MirrorCode benchmark tests whether AI models can recreate complete programs without access to the original code. Claude Opus 4.7 leads with a 56 percent solve rate, rebuilding a 16,000-line toolkit in just 14 hours. But every model tested still fails on the most complex tasks. The article appeared first on .
Read the original article