GTA-Net: Cooperative Game Theory for Vision-Language Alignment in Chest X-Ray Report Generation (opens in new tab)
Automated chest X-ray report generation requires precise cross-modal grounding to ensure clinically reliable descriptions. However, existing vision-language models rely on implicit attention mechanisms that fail to enforce explicit region-word correspondence and disease-level consistency. We propose Game-Theoretic Alignment Network (GTA-Net), a vision-language framework that formulates report generation as a cooperative game-theoretic alignment ...
Read the original article