Collaborative Text-to-Image Generation via Multi-Agent Reinforcement Learning and Semantic Fusion
arxiv.org·11h
SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation
arxiv.org·11h
When Images Speak Louder: Mitigating Language Bias-induced Hallucinations in VLMs through Cross-Modal Guidance
arxiv.org·11h
Loading...Loading more...