Abstract:The inherent multimodality and heterogeneous temporal structures of medical data pose significant challenges for modeling. We propose MedM2T, a time-aware multimodal framework designed to address these complexities. MedM2T integrates: (i) Sparse Time Series Encoder to flexibly handle irregular and sparse time series, (ii) Hierarchical Time-Aware Fusion to capture both micro- and macro-temporal patterns from multiple dense time series, such as ECGs, and (iii) Bi-Modal Attention to extract cross-modal interactions, which can be extended to any number of modalities. To mitigate granularity gaps between modalities, MedM2T uses modality-specific pre-trained encoders and aligns resulting features within a shared encoder. We evalu…
Abstract:The inherent multimodality and heterogeneous temporal structures of medical data pose significant challenges for modeling. We propose MedM2T, a time-aware multimodal framework designed to address these complexities. MedM2T integrates: (i) Sparse Time Series Encoder to flexibly handle irregular and sparse time series, (ii) Hierarchical Time-Aware Fusion to capture both micro- and macro-temporal patterns from multiple dense time series, such as ECGs, and (iii) Bi-Modal Attention to extract cross-modal interactions, which can be extended to any number of modalities. To mitigate granularity gaps between modalities, MedM2T uses modality-specific pre-trained encoders and aligns resulting features within a shared encoder. We evaluated MedM2T on MIMIC-IV and MIMIC-IV-ECG datasets for three tasks that encompass chronic and acute disease dynamics: 90-day cardiovascular disease (CVD) prediction, in-hospital mortality prediction, and ICU length-of-stay (LOS) regression. MedM2T outperformed state-of-the-art multimodal learning frameworks and existing time series models, achieving an AUROC of 0.947 and an AUPRC of 0.706 for CVD prediction; an AUROC of 0.901 and an AUPRC of 0.558 for mortality prediction; and Mean Absolute Error (MAE) of 2.31 for LOS regression. These results highlight the robustness and broad applicability of MedM2T, positioning it as a promising tool in clinical prediction. We provide the implementation of MedM2T at this https URL.
| Comments: | This preprint version of the manuscript has been submitted to the IEEE Journal of Biomedical and Health Informatics (JBHI) for review. The implementation of MedM2T is available at this https URL | 
| Subjects: | Machine Learning (cs.LG) | 
| Cite as: | arXiv:2510.27321 [cs.LG] | 
| (or arXiv:2510.27321v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2510.27321 arXiv-issued DOI via DataCite (pending registration) | 
Submission history
From: YuChen Kuo [view email] [v1] Fri, 31 Oct 2025 09:47:58 UTC (1,089 KB)