GeoRC: A Benchmark for Geolocation Reasoning Chains

arXiv:2601.21278v1 Announce Type: new Abstract: Vision Language Models (VLMs) are good at recognizing the global location of a photograph – their geolocation prediction accuracy rivals the best human experts. But many VLMs are startlingly bad at explaining which image evidence led to their prediction, even when their location prediction is correct. The reasoning chains produced by VLMs frequently hallucinate scene attributes to support their location prediction (e.g. phantom writing, imagined infrastructure, misidentified flora). In this paper, we introduce the first benchmark for geolocation reasoning chains. We focus on the global location prediction task in the popular GeoGuessr game which draws from Google Street View spanning more than 100 countries. We collaborate with expert GeoGue…

Similar Posts