CSS, HTML
TAR-TVG: Enhancing VLMs with Timestamp Anchor-Constrained Reasoning for Temporal Video Grounding
arxiv.org·15h
CATP: Contextually Adaptive Token Pruning for Efficient and Enhanced Multimodal In-Context Learning
arxiv.org·15h
BigTokDetect: A Clinically-Informed Vision-Language Model Framework for Detecting Pro-Bigorexia Videos on TikTok
arxiv.org·15h
Loading...Loading more...