LaunchAA-BriefcaseA new proprietary benchmark for long-horizon knowledge work (opens in new tab)
A new frontier agentic evaluation measuring how well AI models perform realistic, long-horizon knowledge work across multi-week scenarios.
Read the original article