RL Environments and the Hierarchy of Agentic Capabilities
surgehq.ai·3h·
Discuss: Hacker News
Flag this post

2025 has been the year of agents, with AI moving out of the chat box and into the real world. But are we really close to having generally intelligent agents, or are they still a decade away? The trillion-dollar question: how much economically useful work can these agents actually do?

To answer that question, training and evaluation of models has shifted from rating individual responses to assessing multi-step tasks with tool use. For those involved in testing and post-training, 2025 is the year of RL environments: virtual worlds where models can act, experiment, and learn through realistic multi-step tasks.

We “hired” nine AI models models to perform 150 tasks in one of our RL environments. These were the results:

![](https://cdn.prod.website-files.com/68dcd2ceb173c46fa02993…

Similar Posts

Loading similar posts...