Scaling Coding-Agent RL to 32x H100s. 160% Improvement on Stanford's TBench
github.com·12h·
Flag this post

⚡️ Scaling Coding-Agent RL to 32x H100s. Achieving 160% improvement on Stanford’s TerminalBench

🎯 TL;DR

  • I trained a 14B orchestrator model to better coordinate explorer & coder subagents
  • I scaled this to 32x Nvidia H100s, and 416x Intel Xeon Platinum 8470 CPU cores.
  • Qwen3-14B achieved a 160.71% relative increase on Stanford’s TerminalBench after training.
  • Full training code, model weights, datasets, and documentation are released below.

This project builds upon the great prime-rl framework developed by Prime Intellect, and heavily depends upon the multi-agent architecture developed in multi-agent-coder. Please note that this code and the resulting model are m…

Similar Posts

Loading similar posts...