ChronusOmni: Improving Time Awareness of Omni Large Language Models
arxiv.org·1d
⏱️Interval Parsing
Preview
Report Post

Title:ChronusOmni: Improving Time Awareness of Omni Large Language Models

View PDF HTML (experimental)

Abstract:Time awareness is a fundamental ability of omni large language models, especially for understanding long videos and answering complex questions. Previous approaches mainly target vision-language scenarios and focus on the explicit temporal grounding questions, such as identifying when a visual event occurs or determining what event happens at aspecific time. However, they often make insufficient use of the audio modality, and overlook implicit temporal grounding across modalities–for example, identifying what is visually present when a character speaks, or determining what is said when a visual…

Similar Posts

Loading similar posts...