CEO-Bench: Can AI run a simulated startup for 500 days? (opens in new tab)
CEO-Bench evaluates whether AI agents can steer a simulated AI startup for 500 days, testing long-term planning, adaptation, and coordination under uncertainty.
Read the original article