Shape Recognition, Document Layout, Spatial Indexing, Visual Similarity
Dynamic Pattern Alignment Learning for Pretraining Lightweight Human-Centric Vision Models
arxiv.orgยท1d
CROP: Integrating Topological and Spatial Structures via Cross-View Prefixes for Molecular LLMs
arxiv.orgยท1d
Learning More by Seeing Less: Line Drawing Pretraining for Efficient, Transferable, and Human-Aligned Vision
arxiv.orgยท1d
BigTokDetect: A Clinically-Informed Vision-Language Model Framework for Detecting Pro-Bigorexia Videos on TikTok
arxiv.orgยท1d
Loading...Loading more...