Neural Recognition, Document AI, Layout Analysis, Multi-modal Processing
EquiContact: A Hierarchical SE(3) Vision-to-Force Equivariant Policy for Spatially Generalizable Contact-rich Tasks
arxiv.orgยท1d
Identifying Signatures of Image Phenotypes to Track Treatment Response in Liver Disease
arxiv.orgยท1h
NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation Models
arxiv.orgยท1d
MoSAiC: Multi-Modal Multi-Label Supervision-Aware Contrastive Learning for Remote Sensing
arxiv.orgยท3d
Spatial ModernBERT: Spatial-Aware Transformer for Table and Key-Value Extraction in Financial Documents at Scale
arxiv.orgยท2d
Bridging the Gap in Vision Language Models in Identifying Unsafe Concepts Across Modalities
arxiv.orgยท1d
Loading...Loading more...