A Benchmark for AI Agents Driving Scientific and Engineering Progress (opens in new tab)
An arena for evaluating AI agents on performance engineering tasks. 7+ frontier models benchmarked across 23 tasks in system optimization and LLM development.
Read the original article