An arena for evaluating AI agents on performance engineering tasks. 7+ frontier models benchmarked across 23 tasks in system optimization and LLM development.
Press ? anytime to show this help