Abstract
Bambusa tulda Roxb., a member of the Bambusoideae subfamily, is an ecologically and commercially important plant resource widely distributed in the Indian subcontinent. Our study reports long-read PacBio HiFi sequencing and genome assembly of B. tulda. The de novo haploid genome assembly of B. tulda predicted 43 contigs, distributed across three subgenomes, with a total sequenced length of 1.37 Gb, contig N50 of 35.69 Mb, and BUSCO score 99%. Repetitive elements constitute 63.31% of the genome. Functional annotation predicted 56,890 protein-coding genes, constituting 19.44% of the genome. This genome sequence will serve as an invaluable resource for future studies on the life history traits, phylogenomic analysis, comparative genomics, and targeted genome m…
Abstract
Bambusa tulda Roxb., a member of the Bambusoideae subfamily, is an ecologically and commercially important plant resource widely distributed in the Indian subcontinent. Our study reports long-read PacBio HiFi sequencing and genome assembly of B. tulda. The de novo haploid genome assembly of B. tulda predicted 43 contigs, distributed across three subgenomes, with a total sequenced length of 1.37 Gb, contig N50 of 35.69 Mb, and BUSCO score 99%. Repetitive elements constitute 63.31% of the genome. Functional annotation predicted 56,890 protein-coding genes, constituting 19.44% of the genome. This genome sequence will serve as an invaluable resource for future studies on the life history traits, phylogenomic analysis, comparative genomics, and targeted genome modification for important trait improvement of B. tulda.
Data availability
The complete dataset for B. tulda whole genome sequencing and assembly are available in the EMBL-EBI ENA database. The raw PacBio genome sequencing data and raw Illumina RNA-Seq data are available under BioProject accession number ERP174591, while the genome assembly and annotation are available at GenBank under accession GCA_965643865. The genome annotation files of Bambusa tulda are also available in the Figshare database (https://doi.org/10.6084/m9.figshare.31037047).
Code availability
All softwares and pipeline utilized in this study for data analyses were implemented in full compliance with the manuals and protocols described by the respective published bioinformatics tools. Details on the software versions and parameters are outlined in the Methods section. In cases where specific parameters are not specified, default settings were applied. No custom programming or coding was employed.
References
Basak, M. et al. Genomic insights into growth and development of bamboos: what have we learnt and what more to discover? Trees 35, 1771–1791, https://doi.org/10.1007/s00468-021-02197-6 (2021).
Zheng, Y. et al. Allele‐aware chromosome‐scale assembly of the allopolyploid genome of hexaploid Ma Bamboo (Dendrocalamus latiflorus Munro). J. Integr. Plant Biol. 64(3), 649–670, https://doi.org/10.1111/jipb.13217 (2022).
Kellogg, E. A. & Kellogg, E. A. Description of the Family, Vegetative Morphology and Anatomy: Poaceae (R. Br.) Barnh (1895). Gramineae Juss.(1789). Flowering Plants. Monocots: Poaceae. Cham: Springer International Publishing, 3–23, https://doi.org/10.1007/978-3-319-15332-2_1 (2015). 1.
Zhao, H. et al Bamboo and rattan: Nature-based solutions for sustainable development. The Innovation, 3(6), https://doi.org/10.1016/j.xinn.2022.100337 (2022). 1.
Ma, P. F. et al. Genome assemblies of 11 bamboo species highlight diversification induced by dynamic subgenome dominance. Nat. Genet. 56(4), 710–720, https://doi.org/10.1038/s41588-024-01683-0 (2024).
Li, W. et al. Draft genome of the herbaceous bamboo Raddia distichophylla. G3: Genes|Gnomes|Genetics 11(2), jkaa049, https://doi.org/10.1093/g3journal/jkaa049 (2021).
Janzen, D. H. Why bamboos wait so long to flower. Ann. Rev. Ecol. Sys. 347–391, https://doi.org/10.1146/annurev.es.07.110176.002023 (1976). 1.
Chakraborty, S. et al. Studies on reproductive development and breeding habit of the commercially important bamboo Bambusa tulda Roxb. Plants 10(11), 2375, https://doi.org/10.3390/plants10112375 (2021).
Biswas, S. et al. Cellulose and lignin profiling in seven, economically important bamboo species of India by anatomical, biochemical, FTIR spectroscopy and thermogravimetric analysis. Biomass and Bioenergy 158, 106362, https://doi.org/10.1016/j.biombioe.2022.106362 (2022).
Peng, Z. et al. The draft genome of the fast-growing non-timber forest species moso bamboo (Phyllostachys heterocycla). Nat. Genet. 45(4), 456–461, https://doi.org/10.1038/ng.2569 (2013).
Espinosa, E., Bautista, R., Larrosa, R. & Plata, O. Advancements in long-read genome sequencing technologies and algorithms. Genomics, 110842, https://doi.org/10.1016/j.ygeno.2024.110842 (2024). 1.
Kumar, P. P., Turner, I. M., Nagaraja Rao, A. & Arumuganathan, K. Estimation of nuclear DNA content of various bamboo and rattan species. Plant Biotechnol. Rep. 5, 317–322, https://doi.org/10.1007/s11816-011-0185-0 (2011).
Ram, H. M. & Gopal, B. H. Some observations on the flowering of bamboos in Mizoram. Curr. Sci., 708–710 (1981). 1.
Bhattacharya, S., Das, M., Bar, R. & Pal, A. Morphological and molecular characterization of Bambusa tulda with a note on flowering. Ann. Bot. 98(3), 529–535, https://doi.org/10.1093/aob/mcl143 (2006).
Das, M. & Bhattacharya, S. & Pal, A. Generation and characterization of SCARs by cloning and sequencing of RAPD products: a strategy for species-specific marker development in bamboo. Ann. Bot. 95(5), 835–841, https://doi.org/10.1093/aob/mci088 (2005).
Das, M. & Pal, A. Clonal propagation and production of genetically uniform regenerants from axillary meristems of adult bamboo. J. Plant Biochem. Biotechnol. 14, 185–188, https://doi.org/10.1007/BF03355956 (2005).
Biswas, P., Chakraborty, S., Dutta, S., Pal, A. & Das, M. Bamboo flowering from the perspective of comparative genomics and transcriptomics. Front. Plant Sci. 7, 1900, https://doi.org/10.3389/fpls.2016.01900 (2016).
Dutta, S. et al. Identification, characterization and gene expression analyses of important flowering genes related to photoperiodic pathway in bamboo. BMC Genomics 19, 1–19, https://doi.org/10.1186/s12864-018-4571-7 (2018).
Dutta, S. et al. Identification and functional characterization of two bamboo FD gene homologs having contrasting effects on shoot growth and flowering. Sci. Rep. 11(1), 7849, https://doi.org/10.1038/s41598-021-87491-6 (2021).
Basak, M., Chakraborty, S., Kundu, S., Dey, S. & Das, M. Identification, expression analyses of APETALA1 gene homologs in Bambusa tulda and heterologous validation of BtMADS14 in Arabidopsis thaliana. Physiol. Mol. Biol. Plants, 1–16, https://doi.org/10.1007/s12298-025-01569-3 (2025). 1.
Doležel, J., Greilhuber, J. & Suda, J. Estimation of nuclear DNA content in plants using flow cytometry. Nat. Protoc. 2(9), 2233–2244, https://doi.org/10.1038/nprot.2007.310 (2007).
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27(6), 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11(1), 1432, https://doi.org/10.1038/s41467-020-14998-3 (2020).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18(2), 170–175, https://doi.org/10.1038/s41467-020-14998-3 (2021).
Smit, A. & Hubley, R. RepeatModeler Open-1.0. Available online at: https://www.repeatmasker.org (2015). 1.
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA 117(17), 9451–9457, https://doi.org/10.1073/pnas.1921046117 (2020).
Bao, Z. & Eddy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12(8), 1269–1276, http://www.genome.org/cgi/doi/10.1101/gr.88502 (2002).
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21(suppl_1), i351–i358, https://doi.org/10.1093/bioinformatics/bti1018 (2005).
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35(suppl_2), W265–W268, https://doi.org/10.1093/nar/gkm286 (2007).
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics, Chapter 4, 4.10. 1–4.10. 14, https://doi.org/10.1002/0471250953.bi0410s05 (2009). 1.
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0. Available online at: https://www.repeatmasker.org (2013-2015). 1.
Gabriel, L. et al. BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA. Genome Res. 34(5), 769–777, https://www.genome.org/cgi/doi/10.1101/gr.278090.123 (2024).
Keilwagen, J., Hartung, F. & Grau, J. GeMoMa: homology-based gene prediction utilizing intron position conservation and RNA-seq data. Gene prediction*:* Methods in Mol. Biol. 1962, Humana, New York, NY, 161–177, https://doi.org/10.1007/978-1-4939-9173-0_9 (2019). 1.
Grabherr, M. G. et al. Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nat. Biotechnol. 29(7), 644, https://doi.org/10.1038/nbt.1883 (2011).
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8(8), 1494–1512, https://doi.org/10.1038/nprot.2013.084 (2013).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12(4), 357–360, https://doi.org/10.1038/nmeth.3317 (2015).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33(3), 290–295, https://doi.org/10.1038/nbt.3122 (2015).
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 1–13, https://doi.org/10.1186/s13059-019-1910-1 (2019).
Bushmanova, E., Antipov, D., Lapidus, A. & Prjibelski, A. D. rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data. GigaScience 8(9), giz100, https://doi.org/10.1093/gigascience/giz100 (2019).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, 1–22, https://doi.org/10.1186/gb-2008-9-1-r7 (2008).
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31(19), 5654–5666, https://doi.org/10.1093/nar/gkg770 (2003).
Campbell, M. A., Haas, B. J., Hamilton, J. P., Mount, S. M. & Buell, C. R. Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis. BMC Genomics 7, 1–17, https://doi.org/10.1186/1471-2164-7-327 (2006).
Altschul, S. F. et al. & Lipman, D. J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402, https://doi.org/10.1093/nar/25.17.3389 (1997).
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30(9), 1236–1240, https://doi.org/10.1093/bioinformatics/btu031 (2014).
Seemann, T. barrnap 0.9: rapid ribosomal RNA prediction. Google Scholar, 792. Available online at https://github.com/tseemann/barrnap (2013). 1.
Chan, P. P. & Lowe, T. M. tRNAscan-SE: searching for tRNA genes in genomic sequences. Gene prediction*:* Methods in Mol. Biol., 1962, Humana, New York, NY, 161–177, https://doi.org/10.1007/978-1-4939-9173-0_1 (2019). 1.
Chan, P. P., Lin, B. Y., Mak, A. J. & Lowe, T. M. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 49(16), 9077–9096, https://doi.org/10.1093/nar/gkab688 (2021).
European Nucleotide Archive https://identifiers.org/insdc.sra:ERP174591 (2025). 1.
NCBI GenBank https://identifiers.org/insdc.gca:GCA_965643865 (2025). 1.
Rupp, O., Kundu, S., Becker, A. & Das, M. Genome annotation files of Bambusa tulda. Figshare https://doi.org/10.6084/m9.figshare.31037047 (2025). 1.
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics., 20, 1–14. https://doi.org/10.1186/s13059-019-1832-y (2019). 1.
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31(19), 3210–3212, https://doi.org/10.1093/bioinformatics/btv351 (2015).
Siri, J. N., Neufeld, E., Parkin, I. & Sharpe, A. Using Simulated Annealing to Declutter Genome Visualizations. In FLAIRS (pp. 201–204). Available online at https://github.com/jorgenunezsiri/accusyn (2020, May). 1.
Hao, Z. et al. RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Comput. Sci. 6, e251, https://doi.org/10.7717/peerj-cs.251 (2020).
Acknowledgements
The research results reported in this paper are funded by the Alexander von Humboldt Foundation, Germany through ‘Research Group Linkage Program’. SK and SB acknowledge JRF fellowships from UGC, India, with NTA Ref. no.: 231610225668 and 211610047259 respectively. SD acknowledges CSIR Project (Grant no.: 38(1493)/19/EMR-II). We thank Prof. J. Dolezel for providing us the seeds of internal reference plants for genome size estimation. We thank Subhayan Paul, Institute of Health Sciences, Presidency University and Rajesh Saha, Bio-Rad Laboratories for providing technical support in operating the flow cytometer machine.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Plant Genomics Laboratory, Department of Life Sciences, Presidency University, 86/1 College Street, Kolkata, 700073, West Bengal, India
Sutrisha Kundu, Sonali Dey, Mridushree Basak, Sudeshna Bera & Malay Das 1.
Bioinformatik und Systembiologie, Justus-Liebig-University, Gießen, D-35390, Germany
Oliver Rupp 1.
Institute of Botany, Justus-Liebig-University, Gießen, D-35392, Germany
Annette Becker
Authors
- Sutrisha Kundu
- Oliver Rupp
- Sonali Dey
- Mridushree Basak
- Sudeshna Bera
- Annette Becker
- Malay Das
Contributions
M.D. and A.B. conceived the study, acquired funding and designed experimental plan. S.K., S.D., M.B. and S.B. were involved in collection of plant samples, performing FACS experiments for genome size estimation, and isolating high molecular weight genomic DNA for high-throughput sequencing. S.K. and O.R. conducted all in silico experiments on genome data analysis. S.K. wrote the manuscript, and M.D., A.B. and O.R. edited it. All authors read and approved the final manuscript.
Corresponding authors
Correspondence to Annette Becker or Malay Das.
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kundu, S., Rupp, O., Dey, S. et al. Genome sequencing, de novo assembly and annotation of the commercially important bamboo, Bambusa tulda Roxb. Sci Data (2026). https://doi.org/10.1038/s41597-026-06679-5
Received: 23 July 2025
Accepted: 21 January 2026
Published: 04 February 2026
DOI: https://doi.org/10.1038/s41597-026-06679-5