Multimodal Learning for Scalable Representation of High-Dimensional Medical Data

View PDF HTML (experimental)

Abstract:Integrating artificial intelligence (AI) with healthcare data is rapidly transforming medical diagnostics and driving progress toward precision medicine. However, effectively leveraging multimodal data, particularly digital pathology whole slide images (WSIs) and genomic sequencing, remains a significant challenge due to the intrinsic heterogeneity of these modalities and the need for scalable and interpretable frameworks. Existing diagnostic models typically operate on unimodal data, overlooking critical cross-modal interactions that can yield richer clinical insights. We introduce MarbliX (Multimodal Association and Retrieval with Binary Latent Indexed matriX), a…

View PDF HTML (experimental)

Abstract:Integrating artificial intelligence (AI) with healthcare data is rapidly transforming medical diagnostics and driving progress toward precision medicine. However, effectively leveraging multimodal data, particularly digital pathology whole slide images (WSIs) and genomic sequencing, remains a significant challenge due to the intrinsic heterogeneity of these modalities and the need for scalable and interpretable frameworks. Existing diagnostic models typically operate on unimodal data, overlooking critical cross-modal interactions that can yield richer clinical insights. We introduce MarbliX (Multimodal Association and Retrieval with Binary Latent Indexed matriX), a self-supervised framework that learns to embed WSIs and immunogenomic profiles into compact, scalable binary codes, termed ``monogram.‘’ By optimizing a triplet contrastive objective across modalities, MarbliX captures high-resolution patient similarity in a unified latent space, enabling efficient retrieval of clinically relevant cases and facilitating case-based reasoning. \textcolor{black}{In lung cancer, MarbliX achieves 85-89% across all evaluation metrics, outperforming histopathology (69-71%) and immunogenomics (73-76%). In kidney cancer, real-valued monograms yield the strongest performance (F1: 80-83%, Accuracy: 87-90%), with binary monograms slightly lower (F1: 78-82%).


Subjects:	Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2409.13115 [eess.IV]
	(or arXiv:2409.13115v2 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.2409.13115 arXiv-issued DOI via DataCite

Submission history

From: Hamid Tizhoosh [view email] [v1] Thu, 19 Sep 2024 22:49:27 UTC (25,331 KB) [v2] Fri, 12 Dec 2025 11:38:27 UTC (10,161 KB)

Submission history

Similar Posts