Dual-Stream Cross-Modal Representation Learning via Residual Semantic Decorrelation
arxiv.org·22h
📊Learned Metrics
Preview
Report Post

View PDF HTML (experimental)

Abstract:Cross-modal learning has become a fundamental paradigm for integrating heterogeneous information sources such as images, text, and structured attributes. However, multimodal representations often suffer from modality dominance, redundant information coupling, and spurious cross-modal correlations, leading to suboptimal generalization and limited interpretability. In particular, high-variance modalities tend to overshadow weaker but semantically important signals, while naïve fusion strategies entangle modality-shared and modality-specific factors in an uncontrolled manner. This makes it difficult to understand which modality actually drives a prediction and to maintain r…

Similar Posts

Loading similar posts...