⚡️ Scaling Coding-Agent RL to 32x H100s. Achieving 160% improvement on Stanford’s TerminalBench

🎯 TL;DR

  • I trained a 14B orchestrator model to better coordinate explorer & coder subagents
  • I scaled this to 32x Nvidia H100s, and 416x Intel Xeon Platinum 8470 CPU cores.
  • Qwen3-14B achieved a 160.71% relative increase on Stanford’s TerminalBench after training.
  • Full training code, model weights, datasets, and documentation are released below.

This project builds upon the great prime-rl framework developed by Prime Intellect, and heavily depends upon the multi-agent architecture developed in multi-agent-coder. Please note that this code and the resulting model are m…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help