Chromosome-Level Genome Assembly of <i>Donacia provosti</i>

Background & Summary

The water area on Earth accounts for the majority, and many insects have adapted to the aquatic ecosystems. Aquatic insects live in water during one or more phases of their life cycle, such as dragonflies and mayflies, which both lay eggs in water, and only after they have developed and matured do they move onto land[1](https://www.nature.com/articles/s41597-025-06051-z#ref-CR1 “Stoks, R. & Córdoba-Aguilar, A. Evolutionary Ecology of Odonata: A Complex Life Cycle Perspective. Annual Review of Entomology 57, 249–265, https://doi.org/10.1146/annurev-ento-120710-100557

(2012).“),[2](https://www.nature.com/articles/s41597-025-06051-z#ref-CR2 “Jacobus, L. M., Macadam, C. R. & Sartori, M. Mayflies (Ephemeroptera) and Their Contributions to Ecosystem Services. Insects 1…

Background & Summary

(2019).“). These aquatic insects play important ecological roles as primary consumers, detritivores, predators, and pollinators in both aquatic and terrestrial ecosystems. Aquatic insect groups evolved from terrestrial ancestors through adaptation to freshwater ecosystems[3](https://www.nature.com/articles/s41597-025-06051-z#ref-CR3 “Wootton, R. J. The historical ecology of aquatic insects: An overview. Palaeogeography, Palaeoclimatology, Palaeoecology 62, 477–492, https://doi.org/10.1016/0031-0182(88)90068-5

(1988).“). Freshwater ecosystems exhibit a remarkable diversity of habitats, including ponds, lakes, and ditches. Colonization of these environments required significant evolutionary adaptations in various physiological and behavioral mechanisms, including thermo- and osmoregulation, respiration, and feeding strategies. However, research on insect aquatic adaptation remains limited, the mechanisms driving physiological, morphological, and behavioral changes during their transition to aquatic environments are poorly understood.

The Chrysomelidae, commonly known as “leaf beetles”, comprises approximately 38,000 described species[4](https://www.nature.com/articles/s41597-025-06051-z#ref-CR4 “Wilf, P. et al. Timing the Radiations of Leaf Beetles: Hispines on Gingers from Latest Cretaceous to Recent. Science 289, 291–294, https://doi.org/10.1126/science.289.5477.291

(2000).“). The majority of living leaf beetles feed on angiosperms, and many leaf beetles are considered agricultural pests, such as Colorado Potato Beetle (Leptinotarsa decemlineata), western corn rootworm (Diabrotica virgifera), striped flea beetle (Phyllotreta striolata) and coconut leaf beetle (Brontispa longissima). D. provosti (Fairmaire, 1885) (Coleoptera: Chrysomelidae) is a damaging pest of aquatic crops, first recorded in Beijing, China, in 1885. The species distribution includes Russia, South Korea, Japan and China, yet is rapidly spreading globally. In China, it is found from Hainan to Heilongjiang, with significant infestations in Hubei and Jiangsu. D. provosti primarily feeds on lotus and rice, causing notches, holes, and epidermal damage to lotus leaves. Lotus stems and roots are damaged by the larvae, resulting in dark brown spots, decay, and stunted growth, and could make the plant susceptible for fungal infections. In the 2000s, it caused 15%–20% losses in lotus root production in China[5](https://www.nature.com/articles/s41597-025-06051-z#ref-CR5 “Zhan, H. et al. Molecular Characterization of Donacia provosti (Coleoptera: Chrysomelidae) Larval Transcriptome by De Novo Assembly to Discover Genes Associated with Underwater Environmental Adaptations. Insects 12, 281, https://doi.org/10.3390/INSECTS12040281

(2021).“). Current control relies on costly chemical pesticides that pollute the water, and pose a threat to human and livestock health. There is an urgent need for environmentally friendly pest management strategies to suppress this pest.

With the advancement of sequencing technologies, their application in entomological research has become increasingly prevalent, and several leaf beetle genomes have now been sequenced. This study presents the first whole-genome sequencing of D. provosti, yielding a high-quality chromosome-level reference genome, and it would provide a valuable resource for future investigations into its ecological adaptations and the development of pest control measures.

Methods

Insect collection and genomic sequencing

Approximately 200 samples of D. provosti were collected from Enshi city(30.25°N,109.05°E), Hubei province, and subjected to a 24-hour laboratory starvation period to minimize contamination of gut content. These samples were subsequently washed with double-distilled water (ddH2O) and ethanol to remove external contaminants, followed by immediate flash-freezing in liquid nitrogen. These samples were then transferred to a −80 °C freezer for long-term storage. Genome sequencing of a female adult was performed using PacBio Revio System with SMRTbell Express Library Prep Kit, generating ~170 Gb HiFi reads and achieving an N50 of ~14 Kb. For Illumina genome sequencing, three short paired-end DNA libraries with a 400-bp insert size were constructed using the TruSeq DNA PCR-Free Library Prep Kit (Illumina) according to the manufacturer’s instructions and sequenced on an Illumina novaseq xplus platform. The total RNA was extracted from three adults, and three short paired-end libraries with a 400-bp insert size was constructed and sequenced on an Illumina novaseq xplus platform. All sequencing work was performed at Berry Genomics Corporation.

Genome assembly and quality assessment

Genome size estimation was conducted through k-mer frequency analysis of PacBio HiFi reads using Jellyfish (v2.1.3)[6](https://www.nature.com/articles/s41597-025-06051-z#ref-CR6 “Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011

(2011).“). The k-mer counting process was executed using the following Jellyfish command: jellyfish count -m 17 -C -s 100 M -t 20, which revealed an estimated genome size of 1.7 Gb.

The PacBio reads were assembled using Hifiasm (ve0.20.0-r639)[7](https://www.nature.com/articles/s41597-025-06051-z#ref-CR7 “Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5

(2021).“) with parameters: -l 3, generating an initial assembly comprising 4,990 contigs with a total length of approximately 2.2 Gb and a contig N50 of 21 Mb. To eliminate redundant sequences, the assembly was further processed using purge_dups (v1.2.3)[8](https://www.nature.com/articles/s41597-025-06051-z#ref-CR8 “Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36, 2896–2898, https://doi.org/10.1093/bioinformatics/btaa025

(2020).“) with parameters set to -2 -a 50. This purification step yielded a refined primary assembly with a total length of 1.76 Gb and an improved contig N50 of 27 Mb (Table 1).

Genome scaffolding

We sequenced an adult sample, generating approximately 150 Gb of Hi-C paired-end reads. Raw reads underwent quality control using fastp (v0.23.1)[9](https://www.nature.com/articles/s41597-025-06051-z#ref-CR9 “Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890, https://doi.org/10.1093/bioinformatics/bty560

(2018).“), resulting in a Q30 score of 92.3%. Subsequently, these reads were aligned to the reference genome using Bowtie2 (v2.4.1)[10](https://www.nature.com/articles/s41597-025-06051-z#ref-CR10 “Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357–359, https://doi.org/10.1038/nmeth.1923

(2012).“). Valid chromatin interactions were identified and filtered (removing multiple hits and singletons) using HiC-Pro (v2.11.0)[11](https://www.nature.com/articles/s41597-025-06051-z#ref-CR11 “Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biology 16, 259, https://doi.org/10.1186/s13059-015-0831-x

(2015).“). Finally, contigs were anchored and oriented into 15 scaffolds (Fig. 1a,b) with YAHS (v1.2.2)[12](https://www.nature.com/articles/s41597-025-06051-z#ref-CR12 “Zhou, C., McCarthy, S. A. & Durbin, R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics 39, https://doi.org/10.1093/bioinformatics/btac808

(2022).“), which correspond to the 15 actual chromosomes[13](https://www.nature.com/articles/s41597-025-06051-z#ref-CR13 “Petitpierre, E. in Biology of Chrysomelidae (eds Jolivet, P., Petitpierre, E. & Hsiao, T. H.) 131–159 (Springer Netherlands, 1988), https://doi.org/10.1007/978-94-009-3105-3_9

(1988).“). This chromosome-level genome assembly achieved a scaffold N50 of 71 Mb, with the longest contig measuring 127 Mb and the shortest at 55 Mb. The circos plot was drawn by TBtools (v2.326)[14](https://www.nature.com/articles/s41597-025-06051-z#ref-CR14 “Chen, C. et al. TBtools: An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data. Molecular plant 13, 1194–1202, https://doi.org/10.1016/j.molp.2020.06.009

(2020).“).

Fig. 1

Heatmap of genome-wide Hi-C data and circular representation of the chromosomes of D. provosti. (a) The heatmap of chromosome interactions in D. provosti, with densities calculated in 500 Kb windows. The frequency of Hi-C interaction links is represented by colours, which ranges from yellow (low) to red (high); (b) Circular representation of the chromosomes. Two tracks represent the distribution of gene density (line plot), TE density (bar plot), with densities calculated in 10 Kb windows, and TEs with overlaps in the RepeatMasker results were removed. Both gene density and TE density are ranging from 0 to 100%.

Genome annotation

The annotation of Transposable Elements‌(TEs) was performed through a comprehensive repeat analysis pipeline. Initially, a de novo repeat library was constructed using RepeatModeler (v2.0.1)[15](https://www.nature.com/articles/s41597-025-06051-z#ref-CR15 “Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. 117, 9451–9457, https://doi.org/10.1073/pnas.1921046117

(2020).“) with the NCBI BLAST engine as the search algorithm (-engine ncbi). Subsequently, RepeatMasker (v4.0.5)[16](https://www.nature.com/articles/s41597-025-06051-z#ref-CR16 “Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences. Curr Protoc Bioinformatics 25, 4.10.11–14.10.14, https://doi.org/10.1002/0471250953.bi0410s25

(2009).“) was employed to identify TEs by integrating both the de novo repeat library and TE databases (RepBase 20170127). Finally, A total of 1.35 Gb (76.6% of the 1.76 Gb genome) repeat sequence was identified (Table 1). While automated annotation methods can be limited in their precision, particularly for genomes with high TEs densities, and coupled with the lack of curated insect TEs databases, this can introduce biases in TEs annotation.

To ab initio predict coding genes, we utilized repeat-masked genome sequences with AUGUSTUS (v2.7)[17](https://www.nature.com/articles/s41597-025-06051-z#ref-CR17 “Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34, W435–W439, https://doi.org/10.1093/nar/gkl200

(2006).“). For homology-based prediction, protein sequences of Chrysomelidae species were retrieved from the NCBI and UniProt databases. These sequences were subsequently mapped to the genome using exonerate (v2.4.0)[18](https://www.nature.com/articles/s41597-025-06051-z#ref-CR18 “Slater, G. S. C. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31, https://doi.org/10.1186/1471-2105-6-31

(2005).“), and incomplete gene models which lacking both start and stop codons were filtered out and removed. Quality-controlled reads from RNA libraries were mapped to the genome using Bowtie2 (v2.4.1)[10](https://www.nature.com/articles/s41597-025-06051-z#ref-CR10 “Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357–359, https://doi.org/10.1038/nmeth.1923

(2012).“). StringTie (v2.1.1)[19](https://www.nature.com/articles/s41597-025-06051-z#ref-CR19 “Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology 33, 290–295, https://doi.org/10.1038/nbt.3122

(2015).“) was then employed to construct gene prediction models based on these alignments. Finally, the gene predictions generated through these three approaches were integrated using the EVidenceModeler (v1.1.1)[20](https://www.nature.com/articles/s41597-025-06051-z#ref-CR20 “Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biology 9, R7, https://doi.org/10.1186/gb-2008-9-1-r7

(2008).“). The transcriptomic evidence was weighted by a factor of 10, ab initio evidence by a factor of 4, and homologous annotation evidence by a factor of 1. A total of 20,130 protein-coding gene models were predicted (Table 1). Functional annotation was performed by aligning all genes to these four databases, NCBI-NR, KEGG, InterPro, and GO database with diamond (v0.9.19)[21](https://www.nature.com/articles/s41597-025-06051-z#ref-CR21 “Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nature Methods 12, 59–60, https://doi.org/10.1038/nmeth.3176

(2015).“) and interproscan (v5.1)[22](https://www.nature.com/articles/s41597-025-06051-z#ref-CR22 “Blum, M. et al. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res 49, D344–D354, https://doi.org/10.1093/nar/gkaa977

(2021).“).

Phylogenetic analysis

Five additional Chrysomelidae species (Table 2), Acanthoscelides obtectus, Diabrotica virgifera, Diorhabda carinulata, Phaedon cochleariae, Phyllotreta striolata were used to infer orthologous genes, and Tribolium castaneum was select as outgroup. Genome assembly and annotation of these five Chrysomelidae species A. obtectus[23](https://www.nature.com/articles/s41597-025-06051-z#ref-CR23 “NCBI GenBank https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_963669975.1/

(2023).“), D. virgifera[24](https://www.nature.com/articles/s41597-025-06051-z#ref-CR24 “NCBI GenBank https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_917563875.1/

(2022).“), D. carinulata[25](https://www.nature.com/articles/s41597-025-06051-z#ref-CR25 “NCBI GenBank https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_026250575.1/

(2022).“), P. cochleariae[26](https://www.nature.com/articles/s41597-025-06051-z#ref-CR26 “NCBI GenBank https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_918026855.4/

(2022).“), P. striolata[27](https://www.nature.com/articles/s41597-025-06051-z#ref-CR27 “NCBI GenBank https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_918026865.1/

(2022).“), and T. castaneum[28](https://www.nature.com/articles/s41597-025-06051-z#ref-CR28 “NCBI GenBank https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_031307605.1/

(2023).“) was download from NCBI genome database. Orthologous gene clusters were identified using the OrthoFinder algorithm. Subsequently, species phylogenetic tree were constructed based on conserved single-copy gene sequences using FastTree (v2.1.10)[29](https://www.nature.com/articles/s41597-025-06051-z#ref-CR29 “Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix. Molecular Biology and Evolution 26, 1641–1650, https://doi.org/10.1093/molbev/msp077

(2009).“) with the JTT + CAT model. Divergence times were then estimated, and gene family expansion and contraction were analyzed. All gene family-related analyses and visualizations were further facilitated by OrthoVenn3[30](https://www.nature.com/articles/s41597-025-06051-z#ref-CR30 “Sun, J. et al. OrthoVenn3: an integrated platform for exploring and visualizing orthologous data across genomes. Nucleic Acids Res 51, W397–W403, https://doi.org/10.1093/nar/gkad313

(2023).“). The calibration times for divergence estimation were set according to data from TimeTree[31](https://www.nature.com/articles/s41597-025-06051-z#ref-CR31 “Kumar, S. et al. TimeTree 5: An Expanded Resource for Species Divergence Times. Molecular Biology and Evolution 39, https://doi.org/10.1093/molbev/msac174

(2022).“), with a minimum of 152 million years ago (MYA) and a maximum of 236 MYA for T. castaneum and A. obtecta. The results revealed that D. provosti diverged from its closest relative, A. obtectus, approximately 0.6 MYA, and T. castaneum around 2.3 MYA (Fig. 2a). Through the analysis of gene family expansion and contraction, we identified 38 expanded gene families and 406 contracted gene families in D. provosti. GO enrichment analysis of the expanded gene families, which conducted utilizing the OmicShare tools[32](https://www.nature.com/articles/s41597-025-06051-z#ref-CR32 “Mu, H. et al. OmicShare tools: A zero-code interactive online platform for biological data analysis and visualization. iMeta 3, e228, https://doi.org/10.1002/imt2.228

(2024).“), revealed significant enrichment in vision-related and immune-related functions, such as response to blue light (GO:0009637) and mannan binding (GO:2001065) (Fig. 2b). GO enrichment analysis for the contracted gene families showed that only one GO term was found to be significantly enriched (GO:0003008, system process).

Fig. 2

Phylogenetic analysis. (a) Phylogeny and orthology analyses between D. provosti and other Chrysomelidae species, and T. castaneum was select as outgroup. The expanded (Red) and Contracted (Blue) gene families are presented alongside the species and nodes; (b) GO enrichment of expand gene family of D. provosti.

Data Records

Raw sequence reads and genome assembly are available on NCBI Sequence Read Archive database with accession numbers BioProject PRJNA1238083[33](https://www.nature.com/articles/s41597-025-06051-z#ref-CR33 “NCBI BioProject https://identifiers.org/ncbi/bioproject:PRJNA1238083

(2025).“); Pacbio, Hi-C, Illumina and transcriptome sequencing reads have been deposited in the Sequence Read Archive (SRA) databases with the accession number of SRP571961[34](https://www.nature.com/articles/s41597-025-06051-z#ref-CR34 “NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP571961

(2025).“); Genome assembly has been deposited at the NCBI under the accession number of JBMHBG000000000[35](https://www.nature.com/articles/s41597-025-06051-z#ref-CR35 “NCBI Genbank https://identifiers.org/ncbi/insdc.gca:GCA_052550795.1

(2025).“); datasets include genome assembly and gene annotation are deposited on figshare[36](https://www.nature.com/articles/s41597-025-06051-z#ref-CR36 “Wang, H. The genome and annotation of Donacia provosti. figshare. Dataset. https://doi.org/10.6084/m9.figshare.28639664.v2

(2025).“).

Technical Validation

The accuracy and completeness of genome assembly and gene annotation were validated using a multi-approach. First, Illumina reads were quality-filtered using fastp (v0.23.1)[9](https://www.nature.com/articles/s41597-025-06051-z#ref-CR9 “Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890, https://doi.org/10.1093/bioinformatics/bty560

(2018).“) with Q30 = 93.4%, then quality-filtered reads mapped to the assembled contigs using BWA (v0.7.5a)[37](https://www.nature.com/articles/s41597-025-06051-z#ref-CR37 “Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760, https://doi.org/10.1093/bioinformatics/btp324

(2009).“), and the resulting alignment processed via SAMtools (v1.19.2)[38](https://www.nature.com/articles/s41597-025-06051-z#ref-CR38 “Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079, https://doi.org/10.1093/bioinformatics/btp352

(2009).“) revealed a mapping rate of 98.90%, and a coverage rate of 97.9%. Second, RNA-Seq reads from three whole-body transcriptomes were quality-filtered using fastp (v0.23.1)[9](https://www.nature.com/articles/s41597-025-06051-z#ref-CR9 “Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890, https://doi.org/10.1093/bioinformatics/bty560

(2018).“) with Q30 = 93–95%, and alimented to the genome assembly with hiast2 (v2.2.1)[39](https://www.nature.com/articles/s41597-025-06051-z#ref-CR39 “Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology 37, 907–915, https://doi.org/10.1038/s41587-019-0201-4

(2019).“), the results indicated that >95% of reads aligned to coding regions. Third, BUSCO (v5.2.2)[40](https://www.nature.com/articles/s41597-025-06051-z#ref-CR40 “Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212, https://doi.org/10.1093/bioinformatics/btv351

(2015).“) analysis for the genome assembly using the Insecta odb10 dataset (-l insecta_odb10 -m genome) identified 98.6% of expected insect single-copy orthologs, with 98.6% were classified as complete, comprising 1.8% duplicated genes, while 0.4% were fragmented. To further ensure the comprehensiveness of the gene annotation, BUSCO (v5.2.2)[40](https://www.nature.com/articles/s41597-025-06051-z#ref-CR40 “Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212, https://doi.org/10.1093/bioinformatics/btv351

(2015).“) was also employed with the Insecta odb10 database (-l insecta_odb10 -m prot), identifying 97.2% of conserved single-copy orthologs in the annotated protein set (97.2% complete, 1.7% duplicated, and 0.4% fragmented). RNA-Seq analysis of three whole-body transcriptomes demonstrated expression of 18,320 (87%) annotated genes in at least one sample. Finally, homology searches against NCBI-NR, KEGG, InterPro, or GO databases revealed significant sequence similarity for 19,635 (98%) of predicted gene models in at least one database.

Data availability

The dataset is available at NCBI Sequence Read Archive database with accession numbers BioProject PRJNA1238083; Raw sequencing reads have been deposited in the Sequence Read Archive (SRA) databases with the accession number of SRP571961; Genome assembly has been deposited at the NCBI under the accession number of JBMHBG000000000.

Code availability

No specific code or script was used in this work. Commands used for data processing were all executed according to the manuals and protocols of the corresponding software.

References

Stoks, R. & Córdoba-Aguilar, A. Evolutionary Ecology of Odonata: A Complex Life Cycle Perspective. Annual Review of Entomology 57, 249–265, https://doi.org/10.1146/annurev-ento-120710-100557 (2012).

Article CAS PubMed Google Scholar 1.

Jacobus, L. M., Macadam, C. R. & Sartori, M. Mayflies (Ephemeroptera) and Their Contributions to Ecosystem Services. Insects 10, 170, https://doi.org/10.3390/insects10060170 (2019).

Article PubMed PubMed Central Google Scholar 1.

Wootton, R. J. The historical ecology of aquatic insects: An overview. Palaeogeography, Palaeoclimatology, Palaeoecology 62, 477–492, https://doi.org/10.1016/0031-0182(88)90068-5 (1988).

Article Google Scholar 1.

Wilf, P. et al. Timing the Radiations of Leaf Beetles: Hispines on Gingers from Latest Cretaceous to Recent. Science 289, 291–294, https://doi.org/10.1126/science.289.5477.291 (2000).

Article CAS PubMed Google Scholar 1.

Zhan, H. et al. Molecular Characterization of Donacia provosti (Coleoptera: Chrysomelidae) Larval Transcriptome by De Novo Assembly to Discover Genes Associated with Underwater Environmental Adaptations. Insects 12, 281, https://doi.org/10.3390/INSECTS12040281 (2021).

Article PubMed PubMed Central Google Scholar 1.

Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).

Article CAS PubMed PubMed Central Google Scholar 1.

Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5 (2021).

Article CAS PubMed PubMed Central Google Scholar 1.

Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36, 2896–2898, https://doi.org/10.1093/bioinformatics/btaa025 (2020).

Article CAS PubMed PubMed Central Google Scholar 1.

Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890, https://doi.org/10.1093/bioinformatics/bty560 (2018).

Article CAS PubMed PubMed Central Google Scholar 1.

Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357–359, https://doi.org/10.1038/nmeth.1923 (2012).

Article CAS PubMed PubMed Central Google Scholar 1.

Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biology 16, 259, https://doi.org/10.1186/s13059-015-0831-x (2015).

Article CAS PubMed PubMed Central Google Scholar 1.

Zhou, C., McCarthy, S. A. & Durbin, R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics 39, https://doi.org/10.1093/bioinformatics/btac808 (2022). 1.

Petitpierre, E. in Biology of Chrysomelidae (eds Jolivet, P., Petitpierre, E. & Hsiao, T. H.) 131–159 (Springer Netherlands, 1988), https://doi.org/10.1007/978-94-009-3105-3_9 (1988). 1.

Chen, C. et al. TBtools: An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data. Molecular plant 13, 1194–1202, https://doi.org/10.1016/j.molp.2020.06.009 (2020).

Article CAS PubMed Google Scholar 1.

Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. 117, 9451–9457, https://doi.org/10.1073/pnas.1921046117 (2020).

Article CAS PubMed PubMed Central Google Scholar 1.

Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences. Curr Protoc Bioinformatics 25, 4.10.11–14.10.14, https://doi.org/10.1002/0471250953.bi0410s25 (2009).

Article Google Scholar 1.

Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34, W435–W439, https://doi.org/10.1093/nar/gkl200 (2006).

Article CAS PubMed PubMed Central Google Scholar 1.

Slater, G. S. C. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31, https://doi.org/10.1186/1471-2105-6-31 (2005).

Article CAS PubMed PubMed Central Google Scholar 1.

Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology 33, 290–295, https://doi.org/10.1038/nbt.3122 (2015).

Article CAS PubMed PubMed Central Google Scholar 1.

Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biology 9, R7, https://doi.org/10.1186/gb-2008-9-1-r7 (2008).

Article CAS PubMed PubMed Central Google Scholar 1.

Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nature Methods 12, 59–60, https://doi.org/10.1038/nmeth.3176 (2015).

Article CAS PubMed Google Scholar 1.

Blum, M. et al. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res 49, D344–D354, https://doi.org/10.1093/nar/gkaa977 (2021).

Article CAS PubMed Google Scholar 1.

NCBI GenBank https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_963669975.1/ (2023). 1.

NCBI GenBank https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_917563875.1/ (2022). 1.

NCBI GenBank https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_026250575.1/ (2022). 1.

NCBI GenBank https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_918026855.4/ (2022). 1.

NCBI GenBank https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_918026865.1/ (2022). 1.

NCBI GenBank https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_031307605.1/ (2023). 1.

Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix. Molecular Biology and Evolution 26, 1641–1650, https://doi.org/10.1093/molbev/msp077 (2009).

Article CAS PubMed PubMed Central Google Scholar 1.

Sun, J. et al. OrthoVenn3: an integrated platform for exploring and visualizing orthologous data across genomes. Nucleic Acids Res 51, W397–W403, https://doi.org/10.1093/nar/gkad313 (2023).

Article CAS PubMed PubMed Central Google Scholar 1.

Kumar, S. et al. TimeTree 5: An Expanded Resource for Species Divergence Times. Molecular Biology and Evolution 39, https://doi.org/10.1093/molbev/msac174 (2022). 1.

Mu, H. et al. OmicShare tools: A zero-code interactive online platform for biological data analysis and visualization. iMeta 3, e228, https://doi.org/10.1002/imt2.228 (2024).

Article CAS PubMed PubMed Central Google Scholar 1.

NCBI BioProject https://identifiers.org/ncbi/bioproject:PRJNA1238083 (2025). 1.

NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP571961 (2025). 1.

NCBI Genbank https://identifiers.org/ncbi/insdc.gca:GCA_052550795.1 (2025). 1.

Wang, H. The genome and annotation of Donacia provosti. figshare. Dataset. https://doi.org/10.6084/m9.figshare.28639664.v2 (2025). 1.

Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760, https://doi.org/10.1093/bioinformatics/btp324 (2009).

Article CAS PubMed PubMed Central Google Scholar 1.

Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079, https://doi.org/10.1093/bioinformatics/btp352 (2009).

Article CAS PubMed PubMed Central Google Scholar 1.

Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology 37, 907–915, https://doi.org/10.1038/s41587-019-0201-4 (2019).

Article CAS PubMed PubMed Central Google Scholar 1.

Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212, https://doi.org/10.1093/bioinformatics/btv351 (2015).

Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was supported by China Agriculture Research System of MOF and MARA 359 (CARS-24-C-03).

Author information

Authors and Affiliations

College of Plant Protection, Yangzhou University, Yangzhou, 225009, China

Hangwei Liu, Lijun Ma, Yuxiang Sun, Muhammad Haseeb Ahsan, Chunyan Wu, Cheng Gong & Chen Luo 1.

College of Horticulture and Landscape Architecture, Yangzhou University, Yangzhou, 225009, China

Liangjun Li

Authors

Hangwei Liu
Lijun Ma
Yuxiang Sun
Muhammad Haseeb Ahsan
Chunyan Wu
Cheng Gong
Liangjun Li
Chen Luo

Contributions

H.L. and C.L. designed the research; H.L. performed the research and improved by L.M., Y.X., C.W. and C.G.; H.L. wrote the paper, which was im

Background & Summary

Background & Summary

Methods

Insect collection and genomic sequencing

Genome assembly and quality assessment

Genome scaffolding

Genome annotation

Phylogenetic analysis

Data Records

Technical Validation

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Similar Posts