Point to Span: Zero-Shot Moment Retrieval for Navigating Unseen Hour-Long Videos
arxiv.org·3d
🗂️Vector Search
Preview
Report Post

View PDF

Abstract:Zero-shot Long Video Moment Retrieval (ZLVMR) is the task of identifying temporal segments in hour-long videos using a natural language query without task-specific training. The core technical challenge of LVMR stems from the computational infeasibility of processing entire lengthy videos in a single pass. This limitation has established a ‘Search-then-Refine’ approach, where candidates are rapidly narrowed down, and only those portions are analyzed, as the dominant paradigm for LVMR. However, existing approaches to this paradigm face severe limitations. Conventional supervised learning suffers from limited scalability and poor generalization, despite substantial resource consumption. Yet, existing zero-shot methods also fail,…

Similar Posts

Loading similar posts...