Shape Recognition, Document Layout, Spatial Indexing, Visual Similarity
Dynamic Pattern Alignment Learning for Pretraining Lightweight Human-Centric Vision Models
arxiv.orgยท20h
Bridging Semantic Logic Gaps: A Cognition-Inspired Multimodal Boundary-Preserving Network for Image Manipulation Localization
arxiv.orgยท20h
CROP: Integrating Topological and Spatial Structures via Cross-View Prefixes for Molecular LLMs
arxiv.orgยท20h
A Two-Tier Approach to Buy It Again Recommendations Using Category and Item Models
hackernoon.comยท1d
Learning More by Seeing Less: Line Drawing Pretraining for Efficient, Transferable, and Human-Aligned Vision
arxiv.orgยท20h
BigTokDetect: A Clinically-Informed Vision-Language Model Framework for Detecting Pro-Bigorexia Videos on TikTok
arxiv.orgยท20h
Information Bottleneck-based Causal Attention for Multi-label Medical Image Recognition
arxiv.orgยท20h
Remote Sensing Image Intelligent Interpretation with the Language-Centered Perspective: Principles, Methods and Challenges
arxiv.orgยท20h
Loading...Loading more...