V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions
arxiv.org·2d
📊Learned Metrics
Preview
Report Post

View PDF HTML (experimental)

Abstract:While many vision-language models (VLMs) are developed to answer well-defined, straightforward questions with highly specified targets, as in most benchmarks, they often struggle in practice with complex open-ended tasks, which usually require multiple rounds of exploration and reasoning in the visual space. Such visual thinking paths not only provide step-by-step exploration and verification as an AI detective but also produce better interpretations of the final answers. However, these paths are challenging to evaluate due to the large exploration space of intermediate steps. To bridge the gap, we develop an evaluation suite, ``Visual Reasoning with multi-step EXplo…

Similar Posts

Loading similar posts...