References
Sackett, D., Strauss, D., Richardson, W., Rosenberg, W. & Haynes, R. Evidence-Based Medicine: How to Practice and Teach EBM. (Churchill Livingstone, 2nd Ed., Edinburgh, 2000).
Ni, Y. et al. Increasing the efficiency of trial-patient matching: automated clinical trial eligibility pre-screening for pediatric oncology patients. BMC Med Inform Decis 15, 1–10, https://doi.org/10.1186/s12911-015-0149-3 (2015).
[Google Scholar](http://scholar.goog…
References
Sackett, D., Strauss, D., Richardson, W., Rosenberg, W. & Haynes, R. Evidence-Based Medicine: How to Practice and Teach EBM. (Churchill Livingstone, 2nd Ed., Edinburgh, 2000).
Ni, Y. et al. Increasing the efficiency of trial-patient matching: automated clinical trial eligibility pre-screening for pediatric oncology patients. BMC Med Inform Decis 15, 1–10, https://doi.org/10.1186/s12911-015-0149-3 (2015).
Wong, C. H., Siah, K. W. & Lo, A. W. Estimation of clinical trial success rates and related parameters. Biostatistics 20, 273–286, https://doi.org/10.1093/biostatistics/kxx069 (2019).
Weng, C. Optimizing clinical research participant selection with informatics. Trends Pharmacol Sci 36, 706–709, https://doi.org/10.1016/j.tips.2015.08.007 (2015).
López-Úbeda, P. et al. Automatic medical protocol classification using machine learning approaches. Comput Methods Prog Biomed 200, 105939, https://doi.org/10.1016/j.cmpb.2021.105939 (2021).
Chondrogiannis, E. et al. A novel semantic representation for eligibility criteria in clinical trials. J Biomed Inform 69, 10–23, https://doi.org/10.1016/j.jbi.2017.03.013 (2017).
French, E. & McInnes, B. T. An overview of biomedical entity linking throughout the years. J Biomed Inform 137, 104252, https://doi.org/10.1016/j.jbi.2022.104252 (2023).
Newbury, A., Liu, H., Idnay, B. & Weng, C. The suitability of UMLS and SNOMED-CT for encoding outcome concepts. JAMIA 30(12), 1895–1903, https://doi.org/10.1093/jamia/ocad161 (2023).
Bodenreider, O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res 32, D267–D270, https://doi.org/10.1093/nar/gkh061 (2004).
Bada, M. et al. Concept annotation in the CRAFT corpus. BMC Bioinform 13, 161, https://doi.org/10.1186/1471-2105-13-161 (2012).
Doğan, R. I., Leaman, R. & Lu, Z. NCBI disease corpus: a resource for disease name recognition and concept normalization. J Biomed Inform 47, 1–10, https://doi.org/10.1016/j.jbi.2013.12.006 (2014).
Elhadad, N. et al. SemEval-2015 task 14: Analysis of clinical text. In Proc. of SemEval 2015, 303–310, https://aclanthology.org/S15-2051.pdf (Association for Computational Linguistics, 2015). 1.
Luo, Y.-F., Sun, W. & Rumshisky, A. MCN: a comprehensive corpus for medical concept normalization. J Biomed Inform 92, 103132, https://doi.org/10.1016/j.jbi.2019.103132 (2019).
Magnini, B. et al. European Clinical Case Corpus. In European Language Grid: A Language Technology Platform for Multilingual Europe, 283–288, https://doi.org/10.1007/978-3-031-17258-8_17 (Springer International Publishing Cham, 2022). 1.
Friedman, C., Hripcsak, G., DuMouchel, W., Johnson, S. B. & Clayton, P. D. Natural language processing in an operational clinical information system. Natural Language Engineering 1, 83–108, https://doi.org/10.1017/S1351324900000061 (1995).
Aronson, A. R. & Lang, F.-M. An overview of MetaMap: historical perspective and recent advances. JAMIA 17, 229–236, https://doi.org/10.1136/jamia.2009.002733 (2010).
Savova, G. K. et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. JAMIA 17, 507–513, https://doi.org/10.1136/jamia.2009.001560 (2010).
Névéol, A., Grouin, C., Leixa, J., Rosset, S. & Zweigenbaum, P. The Quaero French Medical Corpus: A Ressource for Medical Entity Recognition and Normalization. In Proc. of BioTxtM 2014, 24–30, https://perso.lisn.upsaclay.fr/pz/FTPapiers/Neveol_BIOTEXTM2014.pdf (2014). 1.
Kate, R. J. Normalizing clinical terms using learned edit distance patterns. JAMIA 23, 380–386, https://doi.org/10.1093/jamia/ocv108 (2015).
Kors, J. A., Clematide, S., Akhondi, S. A., van Mulligen, E. M. & Rebholz-Schuhmann, D. A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC. JAMIA 22, 948–56, https://doi.org/10.1093/jamia/ocv037 (2015).
Soysal, E. et al. CLAMP–A toolkit for efficiently building customized clinical natural language processing pipelines. JAMIA 25, 331–336, https://doi.org/10.1093/jamia/ocx132 (2018).
Leaman, R., Islamaj Doğan, R. & Lu, Z. DNorm: disease name normalization with pairwise learning to rank. Bioinform 29, 2909–2917, https://doi.org/10.1093/bioinformatics/btt474 (2013).
Angell, R., Monath, N., Mohan, S., Yadav, N. & McCallum, A. Clustering-based inference for biomedical entity linking. In Proc. of the 2021 Conference of the NAACL, 2598–2608, https://doi.org/10.18653/v1/2021.naacl-main.205 (Online, 2021). 1.
Ferré, A., Ba, M. & Bossy, R. Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data. Genomics & informatics 17, https://doi.org/10.5808/GI.2019.17.2.e20 (2019). 1.
Limsopatham, N. & Collier, N. Normalising medical concepts in social media texts by learning semantic representation. In Proc. of the 54th Annual Meeting of the ACL, vol. 1, 1014–1023, https://aclanthology.org/P16-1096.pdf (2016). 1.
García-Pablos, A., Perez, N. & Cuadros, M. Vicomtech at CANTEMIST 2020. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020), CEUR Workshop Proceedings, vol. 17, 25, https://ceur-ws.org/Vol-2664/cantemist_paper17.pdf (2020). 1.
Liu, F., Shareghi, E., Meng, Z., Basaldella, M. & Collier, N. Self-Alignment Pretraining for Biomedical Entity Representations. In Proc. of the 2021 Conference of the NAACL, 4228–4238, https://doi.org/10.18653/v1/2021.naacl-main.334 (2021). 1.
López-García, G., Jerez, J. M., Ribelles, N., Alba, E. & Veredas, F. J. Explainable clinical coding with in-domain adapted transformers. Journal of Biomedical Informatics 139, 104323, https://doi.org/10.1016/j.jbi.2023.104323 (2023).
Xu, D., Zhang, Z. & Bethard, S. A generate-and-rank framework with semantic type regularization for biomedical concept normalization. In Proc. of the 58th Annual Meeting of the ACL, 8452–8464, https://aclanthology.org/2020.acl-main.748.pdf (2020). 1.
Ji, Z., Wei, Q. & Xu, H. BERT-based ranking for biomedical entity normalization. AMIA Jt Summits Transl Sci Proc 2020, 269 (2020).
Miftahutdinov, Z., Kadurin, A., Kudrin, R. & Tutubalina, E. Medical concept normalization in clinical trials with drug and disease representation learning. Bioinform 37, 3856–3864, https://doi.org/10.1093/bioinformatics/btab474 (2021).
Wajsbürt, P., Sarfati, A. & Tannier, X. Medical concept normalization in French using multilingual terminologies and contextual embeddings. J Biomed Inform 114, 103684, https://doi.org/10.1016/j.jbi.2021.103684 (2021).
Alekseev, A. et al. Medical crossing: a cross-lingual evaluation of clinical entity linking. In Proceedings of the thirteenth language resources and evaluation conference, 4212–4220, https://aclanthology.org/2022.lrec-1.447.pdf (2022). 1.
Kury, F. et al. Chia, a large annotated corpus of clinical trial eligibility criteria. Sci Data 7, 1–11, https://doi.org/10.1038/s41597-020-00620-0 (2020).
Whitton, J. & Hunter, A. Automated tabulation of clinical trial results: A joint entity and relation extraction approach with transformer-based language representations. Artif Intell Med 144, 102661, https://doi.org/10.1016/j.artmed.2023.102661 (2023).
Li, J. et al. BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database 2016, https://doi.org/10.1093/database/baw068 (2016). 1.
Tourille, J., Ferret, O., Neveol, A. & Tannier, X. Neural architecture for temporal relation extraction: A Bi-LSTM approach for detecting narrative containers. In Proc. 55th Annual Meeting of the ACL, 224–230, https://doi.org/10.18653/v1/P17-2035 (2017). 1.
Magge, A., Scotch, M. & Gonzalez-Hernandez, G. Clinical NER and relation extraction using bi-char-LSTMs and random forest classifiers. In International workshop on medication and adverse drug event detection, 25–30, http://proceedings.mlr.press/v90/magge18a/magge18a.pdf (PMLR, 2018). 1.
Verga, P., Strubell, E. & McCallum, A. Simultaneously self-attending to all mentions for full-abstract biological relation extraction. Proc. of NAACL 872–884, https://doi.org/10.18653/v1/N18-1080 (2018). 1.
Henry, S., Buchan, K., Filannino, M., Stubbs, A. & Uzuner, O. 2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records. JAMIA 27, 3–12, https://doi.org/10.1093/jamia/ocz166 (2020).
Sun, Z., Xing, L., Zhang, L., Cai, H. & Guo, M. Joint Biomedical Entity and Relation Extraction Based on Feature Filter Table Labeling. IEEE Access 11, 127422–127430, https://doi.org/10.1109/ACCESS.2023.3331504 (2023).
Uzuner, Ö., South, B. R., Shen, S. & DuVall, S. L. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. JAMIA 18, 552–556, https://doi.org/10.1136/amiajnl-2011-000203 (2011).
Gurulingappa, H. et al. Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. J Biomed Inform 45, 885–892, https://doi.org/10.1016/j.jbi.2012.04.008 (2012).
Herrero-Zazo, M., Segura-Bedmar, I., Martínez, P. & Declerck, T. The DDI corpus: An annotated corpus with pharmacological substances and drug–drug interactions. J Biomed Inform 46, 914–20, https://doi.org/10.1016/j.jbi.2013.07.011 (2013).
Luo, L., Lai, P.-T., Wei, C.-H., Arighi, C. N. & Lu, Z. BioRED: a rich biomedical relation extraction dataset. Brief Bioinform 23, bbac282, https://doi.org/10.1093/bib/bbac282 (2022).
Bokharaeian, B., Dehghani, M. & Diaz, A. Automatic extraction of ranked SNP-phenotype associations from text using a BERT-LSTM-based method. BMC Bioinform 24, 144, https://doi.org/10.1186/s12859-023-05236-w (2023).
Silva, D., Rosa, W., Mello, B., Vieira, R. & Rigo, S. Exploring named entity recognition and relation extraction for ontology and medical records integration. Inform Med Unlocked 43, 101381, https://doi.org/10.1016/j.imu.2023.101381 (2023).
Chen, M., Du, F., Lan, G. & Lobanov, V. S. Using Pre-trained Transformer Deep Learning Models to Identify Named Entities and Syntactic Relations for Clinical Protocol Analysis. In Proc. AAAI Spring Symposium, 1–8, https://ceur-ws.org/Vol-2600/paper4.pdf (2020). 1.
Tseo, Y., Salkola, M., Mohamed, A., Kumar, A. & Abnousi, F. Information extraction of clinical trial eligibility criteria. In Proc. of the KDD Workshop on Applied Data Science for Healthcare. 2020, 1–4, https://arxiv.org/abs/2006.07296 (2020). 1.
Dobbins, N. J., Mullen, T., Uzuner, Ö. & Yetisgen, M. The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria. Sci Data 9, 490, https://doi.org/10.1038/s41597-022-01521-0 (2022).
Mayer, T., Marro, S., Cabrio, E. & Villata, S. Enhancing evidence-based medicine with natural language argumentative analysis of clinical trials. Artif Intell Med 118, 102098, https://doi.org/10.1016/j.artmed.2021.102098 (2021).
Nye, B. E. et al. Understanding clinical trial reports: Extracting medical entities and their relations. In AMIA Jt Summits Transl Sci Proc, vol. 2021, 485, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8378650/pdf/3477939.pdf (American Medical Informatics Association, 2021). 1.
Yuan, C. et al. Criteria2Query: a natural language interface to clinical databases for cohort definition. JAMIA 26, 294–305, https://doi.org/10.1093/jamia/ocy178 (2019).
Dobbins, N. J. et al. Leaf: an open-source, model-agnostic, data-driven web application for cohort discovery and translational biomedical research. JAMIA 27, 109–118, https://doi.org/10.1093/jamia/ocz165 (2020).
Tian, S. et al. Opportunities and challenges for ChatGPT and large language models in biomedicine and health. Briefings in Bioinformatics 25.1, – https://doi.org/10.1093/bib/bbad493 (2024).
Chen, H. et al. A Comprehensive Survey on Medical Concept Normalization: Datasets, Techniques, Applications, and Future Directions. Article preprint available at SSRN. 1–30, https://doi.org/10.2139/ssrn.5743824 (2025). 1.
El janah, H., Nachid-Idrissi, Y., Sarrouti, M. & Najah, S. Exploring transformer models: Fine-tuning VS inference on relation extraction from biomedical texts. Computational and Structural Biotechnology Journal. https://doi.org/10.1016/j.csbj.2025.12.004 (2025). 1.
Nerella, S. et al. Transformers and large language models in healthcare: A review. Artificial intelligence in medicine 154, 102900, https://doi.org/10.1016/j.artmed.2024.102900 (2025).
Lin, A. et al. Large language models in clinical trials: applications, technical advances, and future directions. BMC medicine 23(1), 563–581, https://doi.org/10.1186/s12916-025-04348-9 (2025).
Vedula, K. et al. Distilling Large Language Models for Efficient Clinical Information Extraction 154. arXiv preprint arXiv:2501.00031. 1–19, https://arxiv.org/abs/2501.00031 (2024). 1.
Campillos-Llanos, L., Valverde-Mateos, A., Capllonch-Carrión, A., Zakhir-Puig, S., González-Quevedo, D., López-Urbán, M. R. & Hernando-Tundidor, S. CT-EBM-SP - Corpus of Clinical Trials for Evidence-Based Medicine in Spanish (version 3), Zenodo, https://doi.org/10.5281/zenodo.18048413 (2025). 1.
Campillos-Llanos, L., Valverde-Mateos, A., Capllonch-Carrión, A. & Moreno-Sandoval, A. A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine. BMC Med Inform Decis 21, 1–19, https://doi.org/10.1186/s12911-021-01395-z (2021).
Chen, H., Li, R., Cleveland, A. & Ding, J. Enhancing data quality in medical concept normalization through large language models. J Biomed Inf 165, 104812, https://doi.org/10.1016/j.jbi.2025.104812 (2025).
Scientific Library Online (SciELO) [Internet]. FAPESP - BIREME. (1997 -). Last accessed 11th July 2025. Available from: https://www.scielo.org/es/. 1.
PubMed [Internet]. National Library of Medicine (US). (1996 -). Last accessed 11th July 2025. Available from: https://pubmed.ncbi.nlm.nih.gov/. 1.
European Clinical Trial Register (EudraCT) [Internet]. European Medicines Agency (EMA). (2004 -). Last accessed 11th July 2025. Available from: https://www.clinicaltrialsregister.eu. 1.
Campillos-Llanos, L. et al. A French clinical corpus with comprehensive semantic annotations: development of the Medical Entity and Relation LIMSI annOtated Text corpus (MERLOT). Lang Resour Eval 52, 571–601, https://doi.org/10.1007/s10579-017-9382-y (2018).
Campillos-Llanos, L., Valverde-Mateos, A. & Capllonch-Carrión, A. Hybrid natural language processing tool for semantic annotation of medical texts in Spanish. BMC Bioinform 26, https://doi.org/10.1186/s12859-024-05949-6 (2025). 1.
Campillos-Llanos, L. MedLexSp – A medical lexicon for Spanish medical natural language processing. J Biomed Semantics 14, 1–23, https://doi.org/10.1186/s13326-022-00281-5 (2023).
Stenetorp, P. et al. BRAT: a Web-based Tool for NLP-Assisted Text Annotation. Proc. of the Demonstrations Session at EACL, 102–7, https://aclanthology.org/E12-2021.pdf (2012). 1.
Liu, F., Vulić, I., Korhonen, A. & Collier, N. Learning Domain-Specialised Representations for Cross-Lingual Biomedical Entity Linking. In Proc. of the 59th ACL, 565-74, https://www.repository.cam.ac.uk/handle/1810/346234 (Association for Computational Linguistics, 2021). 1.
Gallego, F., López-García, G., Gasco-Sánchez, L., Krallinger, M. & Veredas, F. J. Clinlinker: Medical entity linking of clinical concept mentions in Spanish. In International Conference on Computational Science, 266-280, https://link.springer.com/chapter/10.1007/978-3-031-63775-9_19 (Cham: Springer Nature Switzerland, 2024). 1.
Zhang, T., Wu, F., Katiyar, A., Weinberger, K. Q. & Artzi, Y. Revisiting Few-sample BERT Fine-tuning. In International Conference on Learning Representations https://openreview.net/forum?id=cO1IH43yUF (2021). 1.
Dligach, D. & Palmer, M. Reducing the need for double annotation. In Proc. of the 5th Linguistic Annotation Workshop, 65-73, https://aclanthology.org/W11-0408.pdf (Association for Computational Linguistics, 2011).