Abstract
Nanopore direct RNA sequencing (DRS) offers distinct advantages for transcriptome analysis over the traditional high-throughput RNA sequencing methods by preserving native RNA modifications, eliminating polymerase chain reaction bias, and simplifying the workflow. However, its high basecalling error rate remains a significant hurdle. Here we introduce Coral, a dual context-aware nanopore DRS basecaller that uses a Transformer-based encoder-decoder architecture to capture contextual dependencies at both the signal and sequence levels, substantially improving accuracy. Coral achieves up to a 6.17% improvement in accuracy on human RNA samples compared to Oxford Nanopore Technologies’ Dorado basecaller. This improved accuracy enables the detection of 26% more annotated tran…
Abstract
Nanopore direct RNA sequencing (DRS) offers distinct advantages for transcriptome analysis over the traditional high-throughput RNA sequencing methods by preserving native RNA modifications, eliminating polymerase chain reaction bias, and simplifying the workflow. However, its high basecalling error rate remains a significant hurdle. Here we introduce Coral, a dual context-aware nanopore DRS basecaller that uses a Transformer-based encoder-decoder architecture to capture contextual dependencies at both the signal and sequence levels, substantially improving accuracy. Coral achieves up to a 6.17% improvement in accuracy on human RNA samples compared to Oxford Nanopore Technologies’ Dorado basecaller. This improved accuracy enables the detection of 26% more annotated transcript isoforms. Coral also enhances the downstream haplotype phasing, reducing switch errors by up to 78.8% and Hamming errors by 76%, while phasing 36% more single nucleotide polymorphisms.
Data availability
The RNA sequencing data used in this study are available in the Zenodo database under [https://doi.org/10.5281/zenodo.4557005] and [https://doi.org/10.5281/zenodo.11632496], and in the SRA/ENA databases and AWS Open Data registry under the accession codes and links listed in Supplementary Tables S1, S2, and S15. The processed data generated in this study are provided in the Source Data file. Source data are provided with this paper.
Code availability
The code package for this study, along with a guide to use the code, is available on GitHub at [https://github.com/BioinfoSZU/Coral] and is also available on Zenodo under [https://doi.org/10.5281/zenodo.18153247]77.
References
Stark, R., Grzelak, M. & Hadfield, J. RNA sequencing: the teenage years. Nat. Rev. Genet. 20, 631–656 (2019).
Hansen, K. D., Brenner, S. E. & Dudoit, S. Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 38, e131–e131 (2010).
Kovaka, S., Ou, S., Jenike, K. M. & Schatz, M. C. Approaching complete genomes, transcriptomes and epi-omes with accurate long-read sequencing. Nat. Methods 20, 12–16 (2023).
Garalde, D. R. et al. Highly parallel direct RNA sequencing on an array of nanopores. Nat. Methods 15, 201–206 (2018).
Jain, M., Abu-Shumays, R., Olsen, H. E. & Akeson, M. Advances in nanopore direct RNA sequencing. Nat. Methods 19, 1160–1164 (2022).
Soneson, C. et al. A comprehensive examination of Nanopore native RNA sequencing for characterization of complex transcriptomes. Nat. Commun. 10, 3359 (2019).
Workman, R. E. et al. Nanopore native RNA sequencing of a human poly (A) transcriptome. Nat. Methods 16, 1297–1305 (2019).
Gunter, H. M. et al. mRNA vaccine quality analysis using RNA sequencing. Nat. Commun. 14, 5663 (2023).
Chan, A., Naarmann-de Vries, I. S., Scheitl, C. P., Höbartner, C. & Dieterich, C. Detecting m6A at single-molecular resolution via direct RNA sequencing and realistic training data. Nat. Commun. 15, 3323 (2024).
Baek, A. et al. Single-molecule epitranscriptomic analysis of full-length HIV-1 RNAs reveals functional roles of site-specific m6As. Nat. Microbiol. 9, 1340–1355 (2024). 1.
Wu, Y. et al. Transfer learning enables identification of multiple types of RNA modifications using nanopore direct RNA sequencing. Nat. Commun. 15, 4049 (2024).
Glinos, D. A. et al. Transcriptome variation in human tissues revealed by long-read sequencing. Nature 608, 353–359 (2022).
Berger, E. et al. Improved haplotype inference by exploiting long-range linking and allelic imbalance in RNA-seq datasets. Nat. Commun. 11, 4662 (2020).
Pardo-Palacios, F. J. et al. Systematic assessment of long-read RNA-seq methods for transcript identification and quantification. Nat. Methods 21, 1349–1363 (2024). 1.
Wang, J. et al. Direct RNA sequencing coupled with adaptive sampling enriches RNAs of interest in the transcriptome. Nat. Commun. 15, 481 (2024).
Liu-Wei, W. et al. Sequencing accuracy and systematic errors of nanopore direct RNA sequencing. BMC Genomics 25, 528 (2024).
Wick, R. R., Judd, L. M. & Holt, K. E. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol. 20, 1–10 (2019).
Teng, H. et al. Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning. GigaScience 7, giy037 (2018).
Zeng, J. et al. Causalcall: Nanopore basecalling using a temporal convolutional network. Front. Genet. 10, 1332 (2020).
Huang, N., Nie, F., Ni, P., Luo, F. & Wang, J. SACall: a neural network basecaller for Oxford Nanopore sequencing data based on self-attention mechanism. IEEE/ACM Trans. Comput. Biol. Bioinf. 19, 614–623 (2020).
Graves, A., Fernández, S., Gomez, F. & Schmidhuber, J. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proc. 23rd International Conference on Machine Learning 369–376 (Association for Computing Machinery, New York, NY, USA, 2006). 1.
Oxford Nanopore Technologies. Dorado. https://github.com/nanoporetech/dorado. 1.
Neumann, D., Reddy, A. S. & Ben-Hur, A. RODAN: a fully convolutional architecture for basecalling nanopore RNA sequencing data. BMC Bioinf. 23, 142 (2022).
Li, Q., Sun, C., Wang, D. & Lou, J. GCRTcall: a Transformer based basecaller for nanopore RNA sequencing enhanced by gated convolution and relative position embedding via joint loss training. Front. Genet. 15, 1443532 (2024).
Vaswani, A. et al. Attention is all you need. In Proc. Advances in Neural Information Processing Systems 30 (Curran Associates, Inc., Red Hook, NY, USA, 2017). 1.
Pagès-Gallego, M. & de Ridder, J. Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling. Genome Biol. 24, 71 (2023).
Dorey, A. & Howorka, S. Nanopore DNA sequencing technologies and their applications towards single-molecule proteomics. Nat. Chem. 16, 314–334 (2024).
Wang, Y., Zhao, Y., Bollas, A., Wang, Y. & Au, K. F. Nanopore sequencing technology, bioinformatics and applications. Nat. Biotechnol. 39, 1348–1365 (2021).
Oxford Nanopore Technologies. Guppy. https://nanoporetech.com/community. 1.
Liu, H. et al. Accurate detection of m6A RNA modifications in native RNA sequences. Nat. Commun. 10, 4079 (2019).
Roach, N. P. et al. The full-length transcriptome of C. elegans using direct RNA sequencing. Genome Res. 30, 299–312 (2020).
Grünberger, F. et al. Nanopore sequencing of RNA and cDNA molecules in Escherichia coli. RNA 28, 400–417 (2022). 1.
Krawczyk, P. S. et al. Re-adenylation by TENT5A enhances efficacy of SARS-CoV-2 mRNA vaccines. Nature 641, 984–992 (2025). 1.
Gao, Y. et al. Quantitative profiling of N6-methyladenosine at single-base resolution in stem-differentiating xylem of Populus trichocarpa using Nanopore direct RNA sequencing. Genome Biol. 22, 1–17 (2021).
DeMario, S., Xu, K., He, K. & Chanfreau, G. F. Nanoblot: an R-package for visualization of RNA isoforms from long-read RNA-sequencing data. RNA 29, 1099–1107 (2023).
Jenjaroenpun, P. et al. Decoding the epitranscriptional landscape from native RNA sequences. Nucleic Acids Res. 49, e7–e7 (2021).
Begik, O. et al. Nano3P-seq: transcriptome-wide analysis of gene expression and tail dynamics using end-capture nanopore cDNA sequencing. Nat. Methods 20, 75–85 (2023).
Samarakoon, H. et al. Flexible and efficient handling of nanopore sequencing signal data with slow5tools. Genome Biol. 24, 69 (2023).
Parker, M. T. et al. Nanopore direct RNA sequencing maps the complexity of Arabidopsis mRNA processing and m6A modification. eLife 9, e49658 (2020).
Bilska, A. et al. Immunoglobulin expression and the humoral immune response is regulated by the non-canonical poly(A) polymerase TENT5C. Nat. Commun. 11, 2032 (2020).
Chen, Y. et al. A systematic benchmark of Nanopore long-read RNA sequencing for transcript-level analysis in human cell lines. Nat. Methods 22, 801–812 (2025). 1.
Gao, Y. et al. ESPRESSO: robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data. Sci. Adv. 9, eabq5072 (2023).
Tardaguila, M. et al. SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification. Genome Res. 28, 396–411 (2018).
Parham, P. & Ohta, T. Population biology of antigen presentation by MHC class I molecules. Science 272, 67–74 (1996).
Neefjes, J., Jongsma, M. L., Paul, P. & Bakke, O. Towards a systems understanding of MHC class I and MHC class II antigen presentation. Nat. Rev. Immunol. 11, 823–836 (2011).
Ma, C., Slaughter, C. A. & DeMartino, G. N. Identification, purification, and characterization of a protein activator (PA28) of the 20 S proteasome (macropain). J. Biol. Chem. 267, 10515–10523 (1992).
Voges, D., Zwickl, P. & Baumeister, W. The 26S proteasome: a molecular machine designed for controlled proteolysis. Annu. Rev. Biochem. 68, 1015–1068 (1999).
Liu, Q. et al. IKZF1 and UBR4 gene variants drive autoimmunity and Th2 polarization in IgG4-related disease. J. Clin. Invest. 134, e178692 (2024). 1.
Gao, Z. -w et al. The roles of adenosine deaminase in autoimmune diseases. Autoimmun. Rev. 20, 102709 (2021).
Hara, T. et al. Deletion of the Mint3/Apba3 gene in mice abrogates macrophage functions and increases resistance to lipopolysaccharide-induced septic shock. J. Biol. Chem. 286, 32542–32551 (2011).
Nataf, S., Guillen, M. & Pays, L. The immunometabolic gene N-acetylglucosamine kinase is uniquely involved in the heritability of multiple sclerosis severity. Int. J. Mol. Sci. 25, 3803 (2024).
Davidson, N. M. et al. JAFFAL: detecting fusion genes with long-read transcriptome sequencing. Genome Biol. 23, 10 (2022).
Mitelman F, J. B. & F, M. Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer. https://mitelmandatabase.isb-cgc.org (2024). 1.
Zheng, Z. et al. Symphonizing pileup and full-alignment for deep learning-based long-read variant calling. Nat. Comput. Sci. 2, 797–803 (2022).
Edge, P., Bafna, V. & Bansal, V. HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res. 27, 801–812 (2017).
Roig, B. et al. The discoidin domain receptor 1 as a novel susceptibility gene for schizophrenia. Mol. Psychiatry 12, 833–841 (2007).
Yan, Y. et al. Genetic association of FERMT2, HLA-DRB1, CD2AP, and PTK2B polymorphisms with Alzheimer’s disease risk in the southern Chinese population. Front. Aging Neurosci. 12, 16 (2020).
von Elsner, L. et al. Biallelic FRA10AC1 variants cause a neurodevelopmental disorder with growth retardation. Brain 145, 1551–1563 (2022).
Chitsamankhun, C. et al. Cathepsin C in health and disease: from structural insights to therapeutic prospects. J. Transl. Med. 22, 777 (2024).
Salzer, U. et al. Relevance of biallelic versus monoallelic TNFRSF13B mutations in distinguishing disease-causing from risk-increasing TNFRSF13B variants in antibody deficiency syndromes. Blood 113, 1967–1976 (2009).
Wick, R. R. Badread: simulation of error-prone long reads. J. Open Source Softw. 4, 1316 (2019).
Ono, Y., Hamada, M. & Asai, K. PBSIM3: a simulator for all types of PacBio and ONT long reads. NAR Genomics Bioinf. 4, lqac092 (2022).
Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proc. 32nd International Conference on Machine Learning 448–456 (JMLR.org, 2015). 1.
Ramachandran, P., Zoph, B. & Le, Q. V. Searching for activation functions. In Proc. 6th International Conference on Learning Representations (OpenReview.net, 2018). 1.
Su, J. et al. Roformer: enhanced transformer with rotary position embedding. Neurocomputing 568, 127063 (2024).
Shazeer, N. Glu variants improve transformer. Preprint at https://arxiv.org/abs/2002.05202 (2020). 1.
Liu, L., Liu, X., Gao, J., Chen, W. & Han, J. Understanding the difficulty of training Transformers. In Proc. Empirical Methods in Natural Language Processing 5747–5763 (Association for Computational Linguistics, 2020). 1.
Oxford Nanopore Technologies. Tombo. https://github.com/nanoporetech/tombo. 1.
Oxford Nanopore Technologies. Remora. https://github.com/nanoporetech/remora. 1.
Rios, A., Amrhein, C., Aepli, N. & Sennrich, R. On biasing Transformer attention towards monotonicity. In Proc. North American Chapter of the Association for Computational Linguistics: Human Language Technologies 4474–4488 (Association for Computational Linguistics, 2021). 1.
Dao, T., Fu, D., Ermon, S., Rudra, A. & Ré, C. Flashattention: Fast and memory-efficient exact attention with io-awareness. Adv. Neural Inf. Process. Syst. 35, 16344–16359 (2022).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Patterson, M. et al. WhatsHap: weighted haplotype assembly for future-generation sequencing reads. J. Comput. Biol. 22, 498–509 (2015).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In Proc. 7th International Conference on Learning Representations (OpenReview.net, 2019). 1.
Xie, S. Coral: a dual context-aware basecaller for nanopore direct rna sequencing - code. https://doi.org/10.5281/zenodo.18153247 (2026).
Acknowledgements
This study was supported by the National Key Research and Development Program of China (2022YFF1202104 and 2019YFA0707003), National Natural Science Foundation of China (62471310, 32401256, and 62571401), and the Agricultural Science and Technology Innovation Program (CAAS-ZDRW202503).
Author information
Author notes
These authors contributed equally: Shaohui Xie, Lulu Ding.
Authors and Affiliations
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
Shaohui Xie 1.
National Engineering Laboratory for Big Data System Computing, Shenzhen University, Shenzhen, China
Shaohui Xie, Lulu Ding, Jianqiang Li & Zexuan Zhu 1.
School of Artificial Intelligence, Shenzhen University, Shenzhen, China
Lulu Ding, Jianqiang Li & Zexuan Zhu 1.
State Key Laboratory of Genome and Multi-omics Technologies, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
Yang Yu & Jue Ruan 1.
Guangzhou Institute of Technology, Xidian University, Guangzhou, China
Ling Liu 1.
Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province, College of Physics and Optoelectronic Engineering, Shenzhen University, Shenzhen, China
Yiwen Sun 1.
State Key Laboratory of Radio Frequency Heterogeneous Integration, Shenzhen University, Shenzhen, China
Jianqiang Li & Zexuan Zhu
Authors
- Shaohui Xie
- Lulu Ding
- Yang Yu
- Ling Liu
- Yiwen Sun
- Jianqiang Li
- Jue Ruan
- Zexuan Zhu
Contributions
S.X and Z.Z. conceived the idea of Coral. J.R. and Z.Z. coordinated and supervised the project. S.X. designed and implemented the Coral algorithm. S.X. and L.D. performed experiments, analyzed data, and drafted the manuscript. L.L. and Y.Y. contributed to the design and implementation of the Coral method. Y.S., J.L., L.L., J.R., and Z.Z. provided critical comments on algorithm evaluations and improved the manuscript.
Corresponding authors
Correspondence to Jue Ruan or Zexuan Zhu.
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Jianxin Wang, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Xie, S., Ding, L., Yu, Y. et al. A dual context-aware basecaller for nanopore direct RNA sequencing. Nat Commun (2026). https://doi.org/10.1038/s41467-026-68566-2
Received: 05 November 2024
Accepted: 10 January 2026
Published: 21 January 2026
DOI: https://doi.org/10.1038/s41467-026-68566-2