Mitigating Modal Imbalance in Multimodal Reasoning
arxiv.org·9h

Title:Mitigating Modal Imbalance in Multimodal Reasoning

View PDF HTML (experimental)

Abstract:Foundation models (FMs) deployed in real-world tasks such as computer-use agents must integrate diverse modalities. How good are FMs at performing joint reasoning, simultaneously reasoning over multiple modalities, especially when the modalities interact and relate to each other to form cross-modal context? To better understand this problem, we study FMs on cross-modal conflicts: scenarios where conflicting evidence is presented across modalities. This allows us to examine whether FMs prioritize one modality over another or reason jointly to reconcile the conflict. Our experiments reveal that FMs can recogniz…

Similar Posts

Loading similar posts...