-
Data Descriptor
-
Published: 30 January 2026
-
Shenghui Dong1,2,3 &
-
…
-
[Mingxiang Lu](#auth-Mingxiang-Lu-Aff1-Aff2-Af…
-
Data Descriptor
-
Published: 30 January 2026
-
Shenghui Dong1,2,3 &
-
…
Scientific Data , Article number: (2026) Cite this article
We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.
Abstract
The longfin barb (Acrossocheilus longipinnis), a vulnerable cyprinid fish endemic to China’s Pearl River basin, is of significant conservation concern and also popular in the ornamental fish trade. To facilitate genetic research and molecular breeding for this species, we generated a high-quality genome by integrating PacBio HiFi long reads and Hi-C sequencing data. The final assembly spans approximately 936.04 Mb, achieving high continuity with a contig N50 of 36.09 Mb. Assessment of genome quality revealed excellent completeness (98.76% BUSCO score) and accuracy (QV = 54.46; GCI = 29.76; CRAQ = 96.40). The vast majority of the sequence (927.20 Mb, 99.06%) was successfully anchored to 25 chromosomes. Annotation predicted 24,718 protein-coding genes and identified approximately 553.06 Mb (59.09%) of repetitive elements. This high-quality chromosome-scale reference genome provides a crucial foundation for investigating the genomic underpinnings of A. longipinnis evolution and will significantly advance molecular breeding programs aimed at its conservation and sustainable utilization.
Data availability
Raw sequencing data have been deposited in the NCBI SRA database under BioProject accession number PRJNA1297891, with accession numbers as follows: PacBio HiFi: SRR3477099149; Hi-C: SRR3477099250; RNA sequencing: SRR3477099051; DNA short-read sequencing: SRR3477099352. The genome assembly has been uploaded to the GenBank database under the accession GCA_054083375.153. Moreover, the genome assembly, annotation files (GFF3, FASTA), and gene functional annotation datasets, are available via Figshare41. All datasets are publicly accessible without restrictions.
Code availability
No specific code or script was used in this work. Commands used for data processing were all executed according to the manuals and protocols of the corresponding software.
References
Yuan, L. Y., Liu, X. X. & Zhang, E. Mitochondrial phylogeny of Chinese barred species of the cyprinid genus Acrossocheilus Oshima, 1919 (Teleostei: Cypriniformes) and its taxonomic implications. Zootaxa 4059, 151–168 (2015).
Chen, T. E. et al. A New Species of the Genus Acrossocheilus Oshima, 1919 (Cypriniformes: Cyprinidae) from the Dabie Mountains. Animals 15, 734 (2025).
Hou, X.-J. et al. Complete mitochondrial genome of the freshwater fish Acrossocheilus longipinnis (Teleostei: Cyprinidae): genome characterization and phylogenetic analysis. Biologia 75, 1871–1880 (2020).
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nature biotechnology 37, 1155–1162 (2019).
Lovell, J. T. et al. Four chromosome scale genomes and a pan-genome annotation to accelerate pecan tree breeding. Nature Communications 12, 4125 (2021).
Li, H. & Durbin, R. Genome assembly in the telomere-to-telomere era. Nature Reviews Genetics 25, 658–670 (2024).
Wang, B. et al. Long and Accurate: How HiFi Sequencing is Transforming Genomics. Genomics Proteomics Bioinformatics 23 (2025). 1.
Zheng, J. et al. Chromosome-level genome assembly of Acrossocheilus fasciatus using PacBio sequencing and Hi-C technology. Scientific Data 11, 166 (2024).
Chin, C. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nature Methods 10, 563–569 (2013).
Chen, S., Zhou, Y., Chen, Y. & Jia, G. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Sun, H., Ding, J., Piednoël, M. & Schneeberger, K. findGSE: estimating genome size variation within human and Arabidopsis using k-mer frequencies. Bioinformatics (Oxford, England) 34, 550–557 (2018).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18, 170–175 (2021).
Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics (Oxford, England) 36, 2253–2255 (2020).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Servant, N. et al. HiC-Pro: An optimized and flexible pipeline for Hi-C data processing. Genome Biology 16 (2015). 1.
Durand, N. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Systems 3, 95–98 (2016).
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, eaal3327 (2017).
Durand, N. C. et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell systems 3, 99–101 (2016).
Xu, G.-C. et al. LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly. GigaScience 8 (2018). 1.
Hu, J. et al. NextPolish2:a repeat-aware polishing tool for genomes assembled using HiFi long reads. (bioRxiv, 2023). 1.
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenetic and genome research 110, 462–467 (2005).
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics (Oxford, England) 21(Suppl 1), i351–8 (2005).
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic acids research 35, W265–8 (2007).
Chen, N. Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences. Current Protocols in Bioinformatics 5 (2004). 1.
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research 27, 573–580 (1999).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nature Methods 12, 357–360 (2015).
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biology 20, 278 (2019).
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology 29, 644–652 (2011).
Haas, B. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research 31, 5654–5666 (2003).
Liu, F. et al. The telomere-to-telomere gapless genome of grass carp provides insights for genetic improvement. GigaScience 14 (2025). 1.
Yuan, J. et al. A telomere-to-telomere genome assembly of koi carp (Cyprinus carpio) using long reads and Hi-C technology. GigaScience 14 (2025). 1.
Chen, L. et al. Chromosome-level genome of Poropuntius huangchuchieni provides a diploid progenitor-like reference genome for the allotetraploid Cyprinus carpio. Molecular ecology resources 21, 1658–1669 (2021).
Stanke, M. & Morgenstern, B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic acids research 33, W465–7 (2005).
Solovyev, V., Kosarev, P., Seledsov, I. & Vorobyev, D. Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome biology 7(Suppl 1), S10.1–12 (2006).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biology 9, R7 (2008).
Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Research 27, 49–54 (1999).
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research 28, 27–30 (2000).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nature Methods 12, 59–60 (2015).
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP604471 (2025). 1.
Li, J. Chromosome-level genome assembly of Acrossocheilus longipinnis using PacBio sequencing and Hi-C technology. Figshare. Dataset. https://doi.org/10.6084/m9.figshare.29665907.v1 (2025). 1.
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology 21 (2020). 1.
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Yin, D. et al. Telomere-to-telomere gap-free genome assembly of the endangered Yangtze finless porpoise and East Asian finless porpoise. GigaScience 13 (2024). 1.
Li, K., Xu, P., Wang, J., Yi, X. & Jiao, Y. Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement. Nature Communications 14, 6556 (2023).
Chen, Q., Yang, C., Zhang, G. & Wu, D. GCI: a continuity inspector for complete genome assembly. Bioinformatics 40 (2024). 1.
Huang, Z. A.-O. et al. Evolutionary analysis of a complete chicken genome. Proc Natl Acad Sci USA. 120(8), e2216641120 (2023).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
NCBI sequence read archive https://identifiers.org/ncbi/insdc.sra:SRR34770991 (2025). 1.
NCBI sequence read archive https://identifiers.org/ncbi/insdc.sra:SRR34770992 (2025). 1.
NCBI sequence read archive https://identifiers.org/ncbi/insdc.sra:SRR34770990 (2025). 1.
NCBI sequence read archive https://identifiers.org/ncbi/insdc.sra:SRR34770993 (2025). 1.
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_054083375.1 (2025).
Acknowledgements
This work is supported by Operating funds of Hongshui River Rare Fish conservation Center.
Author information
Authors and Affiliations
Scientific Institute of Pearl River Water Resources Protection, Guangzhou, 510611, China
Zechen E, Fangyuan Xiong, Yuansheng Zhu, Li Wang, Jiajun Zhang, Shenghui Dong & Mingxiang Lu 1.
Hongshui River Rare Fish Conservation Center, Guigang, 537200, China
Zechen E, Fangyuan Xiong, Yuansheng Zhu, Li Wang, Jiajun Zhang, Shenghui Dong & Mingxiang Lu 1.
Engineering Research Center of Hongshui River Rare Fish Conservation, Guangxi Zhuang Autonomous Region, Guigang, 537200, China
Zechen E, Fangyuan Xiong, Yuansheng Zhu, Li Wang, Jiajun Zhang, Shenghui Dong & Mingxiang Lu
Authors
- Zechen E
- Fangyuan Xiong
- Yuansheng Zhu
- Li Wang
- Jiajun Zhang
- Shenghui Dong
- Mingxiang Lu
Contributions
Zechen E conceived this study, designed the experiment, and performed data analysis. Fangyuan Xiong contributed to the experimental design, collected samples, and performed data analysis. Yuansheng Zhu and Li Wang provided funding and contributed to conceptualization. Jiajun Zhang and Shenghui Dong assisted in methodology and data curation. All authors have read and approved the final manuscript.
Corresponding author
Correspondence to Fangyuan Xiong.
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
E, Z., Xiong, F., Zhu, Y. et al. Chromosome-level genome assembly of the longfin barb (Acrossocheilus longipinnis). Sci Data (2026). https://doi.org/10.1038/s41597-026-06656-y
Received: 17 September 2025
Accepted: 19 January 2026
Published: 30 January 2026
DOI: https://doi.org/10.1038/s41597-026-06656-y