HERBench: A Benchmark for Multi-Evidence Integration in Video Question Answering
arxiv.org·18h
🗄️Vector Databases
Preview
Report Post

View PDF HTML (experimental)

Abstract:Video Large Language Models (Video-LLMs) are rapidly improving, yet current Video Question Answering (VideoQA) benchmarks often allow questions to be answered from a single salient cue, under-testing reasoning that must aggregate multiple, temporally separated visual evidence. We present HERBench, a VideoQA benchmark purpose-built to assess multi-evidence integration across time. Each question requires aggregating at least three non-overlapping evidential cues across distinct video segments, so neither language priors nor a single snapshot can suffice. HERBench comprises 26K five-way multiple-choice questions organized into twelve compositional tasks that probe ident…

Similar Posts

Loading similar posts...