AsgardBench: A benchmark for visually grounded interactive planning (opens in new tab)
AsgardBench evaluates whether embodied agents can revise their plans based on visual observations as tasks unfold. By focusing on perception-driven planning, it exposes key limitations and guides improvements in agent reliability.
Read the original article