Beyond English: Uncovering the Multilingual Gap in Vision-Language-Action Models (opens in new tab)

Vision-Language-Action models have recently demonstrated promising capabilities in learning generalist robot policies from large-scale multimodal data. However, most existing VLA systems are trained and evaluated primarily with English instructions, leaving their ability to understand and execute instructions in other languages largely unexplored. While the underlying large language models often possess multilingual capabilities, it remains uncl...

Read the original article