Abstract
Vaccines are the most effective tool in preventing and managing infectious diseases. One of the critical challenges in vaccine development is the selection of suitable target antigens from the thousands of proteins produced by pathogens. Artificial intelligence is anticipated to play a significant role in addressing this challenge. In this study, we develop a framework termed PLGDL for protective antigen prediction that employs Protein Language and Geometric Deep Learning models. This framework leverages both primary sequence features and three-dimensional structural features of protein antigens, thereby reducing the biases associated with manually curated features. Our integrated model exhibits robustness across both constructed and public datasets and is applicable to v…
Abstract
Vaccines are the most effective tool in preventing and managing infectious diseases. One of the critical challenges in vaccine development is the selection of suitable target antigens from the thousands of proteins produced by pathogens. Artificial intelligence is anticipated to play a significant role in addressing this challenge. In this study, we develop a framework termed PLGDL for protective antigen prediction that employs Protein Language and Geometric Deep Learning models. This framework leverages both primary sequence features and three-dimensional structural features of protein antigens, thereby reducing the biases associated with manually curated features. Our integrated model exhibits robustness across both constructed and public datasets and is applicable to viruses, bacteria, and eukaryotic pathogens. Notably, when applied to the ongoing Mpox outbreak, our model not only quickly identifies multiple known antigens but also discovers a protective antigen: G10R. Here, our study provides a high-performance screening tool for protective vaccine antigen prediction by synergistically utilizing the capabilities of protein language and geometric deep learning models, providing substantive insights and methodological advancements for rapid vaccine development.
Data availability
All data supporting the findings of this study, including raw datasets, processed results, and source data, are available within the main text, Supplementary Information, or Supplementary Data files or from the corresponding author on request. Source data are provided with this paper.
Code availability
The code and data required to reproduce the results in this study are openly available via the repository: https://github.com/yunxiangz/PLGDL. This repository includes the scripts for the final XGBoost classifier and the complete, documented source code for the NEGCN model used to generate the protein structural embeddings.
References
Excler, J.-L., Saville, M., Berkley, S. & Kim, J. H. Vaccine development for emerging infectious diseases. Nat. Med. 27, 591–600 (2021).
Sette, A. & Rappuoli, R. Reverse vaccinology: developing vaccines in the era of genomics. Immunity 33, 530–541 (2010).
Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60 (2023).
Gupta, A., Anpalagan, A., Guan, L. & Khwaja, A. S. Deep learning for object detection and scene perception in self-driving cars: survey, challenges, and open issues. Array 10, 100057 (2021).
Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).
Davies, A. et al. Advancing mathematics by guiding human intuition with AI. Nature 600, 70–74 (2021).
Thomas, S., Abraham, A., Baldwin, J., Piplani, S. & Petrovsky, N. Artificial Intelligence in Vaccine and Drug Design. Methods Mol. Biol. 2410, 131–146 (2022). 1.
Bravi, B. Development and use of machine learning algorithms in vaccine target selection. npj Vaccines 9, 15 (2024).
Zhang, H. et al. Algorithm for optimized mRNA design improves stability and immunogenicity. Nature 621, 396–403 (2023).
Wong, F., Fuente-Nunez, C. & Collins, J. J. Leveraging artificial intelligence in the fight against infectious diseases. Science 381, 164–170 (2023).
Rappuoli, R. & Covacci, A. Reverse vaccinology and genomics. Science 302, 602–602 (2003).
Goodswen, S. J., Kennedy, P. J. & Ellis, J. T. A guide to current methodology and usage of reverse vaccinology towards in silico vaccine discovery. FEMS Microbiol. Rev. 47, 004 (2023).
Dalsass, M., Brozzi, A., Medini, D. & Rappuoli, R. Comparison of open-source reverse vaccinology programs for bacterial vaccine antigen discovery. Front. Immunol. 10, 113 (2019).
Doytchinova, I. A. & Flower, D. R. VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines. BMC Bioinform. 8, 1–7 (2007).
Ong, E. et al. Vaxign- ML: Supervised machine learning reverse vaccinology model for improved prediction of bacterial protective antigens. Bioinformatics 36, 3185–3191 (2020).
Hederman, A. P. & Ackerman, M. E. Leveraging deep learning to improve vaccine design. Trends Immunol. 44, 333–344 (2023).
Rawal, K. et al. A web-based deep learning server to identify potential vaccine candidates. Comput. Biol. Med. 145, 105401 (2022).
Zhang, Y., Huffman, A., Johnson, J. & He, Y. Vaxign-DL: a deep learning-based method for vaccine design and its evaluation. Preprint at https://doi.org/10.1016/j.compbiomed.2022.105401 (2023). 1.
Santos-Junior, C. D. et al. Discovery of antimicrobial peptides in the global microbiome with machine learning. Cell 187, 3761–377816 (2024).
Teufel, F. et al. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat. Biotechnol. 40, 1023–1025 (2022).
Yu, T. et al. Enzyme function prediction using contrastive learning. Science 379, 1358–1363 (2023).
Brandes, N., Goldman, G., Wang, C. H., Ye, C. J. & Ntranos, V. Genome-wide prediction of disease variant effects with a deep protein language model. Nat. Genet. 55, 1512–1522 (2023).
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Bepler, T. & Berger, B. Learning the protein language: evolution, structure, and function. Cell Syst. 12, 654–669 (2021).
Pishesha, N., Harmand, T. J. & Ploegh, H. L. A guide to antigen processing and presentation. Nat. Rev. Immunol. 22, 751–764 (2022).
Byrne, P. O. & McLellan, J. S. Principles and practical applications of structure-based vaccine design. Curr. Opin. Immunol. 77, 102209 (2022).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Wu, F., Wu, L., Radev, D., Xu, J. & Li, S. Z. Integration of pre-trained protein language models into geometric deep learning networks. Commun. Biol. 6, 876 (2023).
Zhang, Z. et al. Protein representation learning by geometric structure pretraining. In Proc. ICML Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward, (PMLR, 2022). 1.
Huo, M., Jiang, Y. & Li, S. C. Unlocking T-cell receptor–epitope insights with structural analysis. Nat. Comput. Sci. 4, 475–476 (2024).
Xia, T. & Ku, W.-S. Geometric graph representation learning on protein structure prediction. In Proc. SIGKDD, 1873–1883 (ACM, 2021). 1.
Gligorijevi´c, V. et al. Structure-based protein function prediction using graph convolutional networks. Nat. Commun. 12, 3168 (2021).
Yang, B., Sayers, S., Xiang, Z. & He, Y. Protegen: a web-based protective antigen database and analysis system. Nucleic Acids Res. 39, 1073–1078 (2011).
Consortium, U. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, 506–515 (2019).
Elnaggar, A. et al. Prottrans: toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7112–7127 (2021).
Fournier, Q. et al. Protein language models: Is scaling necessary? Preprint at https://doi.org/10.1101/2024.09.23.614603 (2024). 1.
Varadi, M. et al. AlphaFold protein structure database in 2024: Providing structure coverage for over 214 million protein sequences. Nucleic Acids Res. 52, 368–375 (2024).
Hoerl, A. E. & Kennard, R. W. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970).
Breiman, L. Bagging predictors. Mach. Learn. 24, 123–140 (1996).
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by backpropagating errors. Nature 323, 533–536 (1986).
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
Cristianini, N., Shawe-Taylor, J. An introduction to support vector machines and other kernel-based learning methods. (Cambridge University Press, 2000). https://doi.org/10.1017/CBO9780511801389. 1.
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. SIGKDD. 785–794 (ACM, 2016). 1.
Li, B. et al. On the sentence embeddings from pre-trained language models. In Proc. EMNLP. 9119–9130 (ACL, 2020). 1.
Yan, Y. et al. ConSERT: a contrastive framework for self-supervised sentence representation transfer. In Proc. ACL-IJCNLP. 5065–5075 (ACL, 2021). 1.
Bohm, N. J., Berens, P. & Kobak, D. Unsupervised visualization of image datasets using contrastive learning. In Proc. ICLR. 14539–14559 (OpenReview, 2022). 1.
Dimitrov, I., Zaharieva, N. & Doytchinova, I. Bacterial immunogenicity prediction by machine learning methods. Vaccines 8, 709 (2020).
He, Y. et al. Vaxign: the first web-based vaccine design program for reverse vaccinology and applications for vaccine development. Biomed. Res. Int. 2010, 297505 (2010).
Lum, F.-M. et al. Monkeypox: disease epidemiology, host immunity and clinical interventions. Nat. Rev. Immunol. 22, 597–613 (2022).
Mercy, K. et al. Mpox continues to spread in Africa and threatens global health security. Nat. Med. 30, 1225–1226 (2024).
Huang, Y., Mu, L. & Wang, W. Monkeypox: epidemiology, pathogenesis, treatment and prevention. Signal Transduct. Target Ther. 7, 1–22 (2022).
Nalca, A. & Zumbrun, E. E. ACAM2000: the new smallpox vaccine for the United States strategic national stockpile. Drug Des. Dev. Ther. 4, 71–79 (2010). 1.
Deputy, N. P. et al. Vaccine effectiveness of JYNNEOS against mpox disease in the United States. N. Engl. J. Med. 388, 2434–2443 (2023).
Kenner, J., Cameron, F., Empig, C., Jobes, D. V. & Gurwith, M. LC16m8: an attenuated smallpox vaccine. Vaccine 24, 7009–7022 (2006).
Jacob-Dolan, C. et al. Comparison of the immunogenicity and protective efficacy of ACAM2000, MVA, and vectored subunit vaccines for Mpox in rhesus macaques. Sci. Transl. Med. 16, 4317 (2024).
Ogunkola, I. O. et al. Monkeypox vaccination in the global south: fighting a war without a weapon. Clin. Epidemiol. Glob. Health. 22, 101313 (2023). 1.
Gruber, M. F. Current status of monkeypox vaccines. npj Vaccines 7, 94 (2022).
Otero, M. et al. Efficacy of novel plasmid DNA encoding vaccinia antigens in improving current smallpox vaccination strategy. Vaccine 24, 4461–4470 (2006).
Wang, Y., Yang, K. & Zhou, H. Immunogenic proteins and potential delivery platforms for mpox virus vaccine development: A rapid review. Int. J. Biol. Macromol. 245, 125515 (2023).
Zuiani, A. et al. A multivalent mRNA monkeypox virus vaccine (BNT166) protects mice and macaques from orthopoxvirus disease. Cell 187, 1363–137312 (2024).
Mucker, E. M. et al. Comparison of protection against mpox following mRNA or modified vaccinia Ankara vaccination in nonhuman primates. Cell 187, 5540–5553.e10 (2024).
Bahar, M. W. et al. Structure and function of A41, a vaccinia virus chemokine binding protein. PLoS Pathog. 4, e5 (2008).
Alcam´ı, A., Symons, J. A., Collins, P. D., Williams, T. J. & Smith, G. L. Blockade of chemokine activity by a soluble chemokine binding protein from vaccinia virus. J. Immunol. 160, 624–633 (1998).
Ojeda, S., Domi, A. & Moss, B. Vaccinia virus G9 protein is an essential component of the poxvirus entry-fusion complex. J. Virol. 80, 9822–9830 (2006).
Yang, S. J. Characterization of vaccinia virus A12L protein proteolysis and its participation in virus assembly. Virol. J. 4, 78 (2007).
Reading, P. C., Symons, J. A. & Smith, G. L. A soluble chemokine-binding protein from vaccinia virus reduces virus virulence and the inflammatory response to infection. J. Immunol. 170, 1435–1442 (2003).
Graham, B. S., Gilman, M. S. A. & McLellan, J. S. Structure-based vaccine antigen design. Annu. Rev. Med. 70, 91–104 (2019).
Koebnik, R., Locher, K. P. & Van Gelder, P. Structure and function of bacterial outer membrane proteins: barrels in a nutshell. Mol. Microbiol. 37, 239–253 (2000).
Bussiere, J. L. Animal models as indicators of immunogenicity of therapeutic proteins in humans. Dev. Biol. 112, 135–139 (2003). 1.
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
Yang, Y. et al. Rational design of a single-component mRNA vaccine against orthopoxvirus and SARS-CoV-2. Sci. China Life Sci. 67, 1311–1313 (2024).
Acknowledgements
We thank Prof. Lihua Hou for proofreading and editing the manuscript. This work was partially funded by the National Natural Science Foundation of China (32571110, 32070025, 62306333) and the Young Elite Scientists Sponsorship Program by CAST (2023QNRC001).
Author information
Author notes
These authors contributed equally: Xiaodong Zai, Yunxiang Zhao.
Authors and Affiliations
Laboratory of Advanced Biotechnology, Beijing Institute of Biotechnology, Beijing, China
Xiaodong Zai, Yunxiang Zhao, Xiaolin Wang, Yilong Yang, Xiaofan Zhao, Ruihua Li, Yaohui Li, Yue Zhang, Jun Zhang, Hongguang Ren, Junjie Xu & Wei Chen 1.
College of Computer, National University of Defence Technology, Changsha, China
Mingyue Leng, Menglong Lu & Dongsheng Li
Authors
- Xiaodong Zai
- Yunxiang Zhao
- Xiaolin Wang
- Mingyue Leng
- Menglong Lu
- Yilong Yang
- Xiaofan Zhao
- Ruihua Li
- Yaohui Li
- Yue Zhang
- Jun Zhang
- Dongsheng Li
- Hongguang Ren
- Junjie Xu
- Wei Chen
Contributions
W.C., J.X., and H.R. conceptualized the project. X.D.Z. and Y.X.Z. designed the PLGDL framework. X.D.Z. and Y.X.Z. prepared the benchmarks. Y.X.Z., D.L., M.Y.L., and M.L.L. evaluated the performance of methods over the benchmarks. X.W., Y.Y., X.F.Z., R.L., Y.L., Y.Z., and J.Z. conducted the immune evaluation experiment. X.D.Z. and Y.X.Z. wrote the original draft of the manuscript. All authors reviewed and edited the manuscript. J.X., H.R., and X.D.Z. supervised the project.
Corresponding authors
Correspondence to Hongguang Ren, Junjie Xu or Wei Chen.
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zai, X., Zhao, Y., Wang, X. et al. Integrating protein language and geometric deep learning models for enhanced vaccine antigen prediction. Nat Commun (2025). https://doi.org/10.1038/s41467-025-67778-2
Received: 27 September 2024
Accepted: 09 December 2025
Published: 21 December 2025
DOI: https://doi.org/10.1038/s41467-025-67778-2