Models keep improving on long-horizon tasks, but splitting work across many agents doesn’t suit every problem. (opens in new tab)

Models keep improving on long-horizon tasks, but splitting work across many agents doesn’t suit every problem. We walk through the setup for a single agent working sequentially on a task where mistakes compound: modeling the early universe. Read more: https://www.anthropic.com/research/long-running-Claude