- Data Descriptor
- Open access
- Published: 29 January 2026
Scientific Data , Article number: (2026) Cite this article
We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.
Abstract
The gobies represent the most species-rich family of marine fishes and occupy a remarkable range of ecological niches. Despite their abundance and key role in coastal ecosystems, Mediterranean gobies remain underrepresented in genomic resources...
- Data Descriptor
- Open access
- Published: 29 January 2026
Scientific Data , Article number: (2026) Cite this article
We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.
Abstract
The gobies represent the most species-rich family of marine fishes and occupy a remarkable range of ecological niches. Despite their abundance and key role in coastal ecosystems, Mediterranean gobies remain underrepresented in genomic resources. The rock goby (Gobius paganellus), a common inhabitant of rocky intertidal and subtidal habitats throughout the Northeast Atlantic and Mediterranean Sea, exemplifies ecological and physiological resilience to highly variable environments. Here, we present a chromosome-level genome assembly for G. paganellus, generated using a combination of PacBio HiFi and Hi-C sequencing. The final assembly spans 813 Mb, with more than 99.9% of the sequence anchored to 23 pseudochromosomes (Scaffold N50: 36.4 Mb; Contig N50: 20.3 Mb), and shows high completeness, as indicated by a 98.8% BUSCO score. We characterised the genome’s repeat landscape and identified 23,493 protein-coding genes, over 96% of which showed homology to known proteins. This high-quality genomic resource provides a foundation for future population genomics-based research into this species and holds the potential to foster comparative genomic analysis across gobies and teleosts more broadly.
Data availability
All data supporting this study are publicly available. Raw PacBio HiFi, Hi-C, and RNA-Seq reads are deposited in the NCBI Sequence Read Archive (BioProject PRJNA1298813); the genome assembly is available in GenBank (WGS accession JBPXNH000000000; version JBPXNH010000000 analyzed here). The protein-coding gene annotation is archived on Figshare (https://doi.org/10.6084/m9.figshare.29820362). A complete inventory of files and formats is provided in Supplementary Table S1.
Code availability
No custom scripts or proprietary code were used in this study. All software tools employed are publicly available, and all parameters are explicitly detailed in the Methods section.
References
Nelson, J. S., Grande, T. C. & Wilson, M. V. H. Fishes of the World. (John Wiley & Sons, 2016). 1.
Thacker, C. E. & Roje, D. M. Phylogeny of Gobiidae and identification of gobiid lineages. Syst. Biodivers. 9, 329–347 (2011).
Shang, Y. et al. Adaptability and Evolution of Gobiidae: A Genetic Exploration. Animals 12, 1741 (2022).
Salvanes, A. G. V., Utne-Palm, A. C., Currie, B. & Braithwaite, V. A. Behavioural and physiological adaptations of the bearded goby, a key fish species of the extreme environment of the northern Benguela upwelling. Mar. Ecol. Prog. Ser. 425, 193–202 (2011).
Top, N., Karakuş, U., Tepeköy, E. G., Britton, J. R. & Tarkan, A. S. Plasticity in habitat use of two native Ponto-Caspian gobies, Proterorhinus semilunaris and Neogobius fluviatilis: implications for invasive populations. Knowl. Manag. Aquat. Ecosyst. 40 https://doi.org/10.1051/kmae/2019031 (2019). 1.
Thode, G., Giles, V. & Alvarez, M. C. Multiple Chromosome Polymorphism in Gobius paganellus (Teleostei, Perciformes). Heredity 54, 3–7 (1985).
Amores, A., Giles, V., Thode, G. & Alvarez, M. C. Adaptative character of a Robertsonian fusion in chromosomes of the fish Gobius paganellus (Pisces, Perciformes). Heredity 65, 151–155 (1990).
da Silva, S. A. S. et al. High chromosomal evolutionary dynamics in sleeper gobies (Eleotridae) and notes on disruptive biological factors in Gobiiformes karyotypes (Osteichthyes, Teleostei). Mar. Life Sci. Technol. 3, 293–302 (2021).
Madeira, D. et al. Physiological and biochemical thermal stress response of the intertidal rock goby Gobius paganellus. Ecol. Indic. 46, 232–239 (2014).
Vinagre, C. et al. Effect of increasing temperature in the differential activity of oxidative stress biomarkers in various tissues of the Rock goby, Gobius paganellus. Mar. Environ. Res. 97, 10–14 (2014).
Paul, N., Novais, S. C., Lemos, M. F. L. & Kunzmann, A. Chemical predator signals induce metabolic suppression in rock goby (Gobius paganellus). PLOS ONE 13, e0209286 (2018).
Gobius paganellus genome assembly ASM5352551v1. NCBI Genbank https://identifiers.org/ncbi/insdc.gca:GCA_053525515.1 (2025). 1.
Bianchi, C. N. & Morri, C. Marine Biodiversity of the Mediterranean Sea: Situation, Problems and Prospects for Future Research. Mar. Pollut. Bull. 40, 367–376 (2000).
Coll, M. et al. The Biodiversity of the Mediterranean Sea: Estimates, Patterns, and Threats. PLOS ONE 5, e11842 (2010).
Boulanger, E. et al. Climate differently influences the genomic patterns of two sympatric marine fish species. J. Anim. Ecol. 91, 1180–1195 (2022).
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1432 (2020).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36, 2896–2898 (2020).
Challis, R. J., Richards, E., Rajan, J., Cochrane, G. & Blaxter, M. L. BlobToolKit – Interactive quality assessment of genome assemblies. G3: Genes, Genomes, Genetics, 10, 1361–1374 (2020). 1.
Zhou, C., McCarthy, S. A. & Durbin, R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics 39, btac808 (2023).
Uliano-Silva, M. et al. MitoHiFi: a python pipeline for mitochondrial genome assembly from PacBio high fidelity reads. BMC Bioinformatics 24, 288 (2023).
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Storer, J., Hubley, R., Rosen, J., Wheeler, T. J. & Smit, A. F. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob. DNA 12, 2 (2021).
Gabriel, L. et al. BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA. Genome Res. 34, 769–777 (2024).
Brůna, T., Lomsadze, A. & Borodovsky, M. GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes. Genome Res. 34, 757–768 (2024).
Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644 (2008).
Gabriel, L., Hoff, K. J., Brůna, T., Borodovsky, M. & Stanke, M. TSEBRA: transcript selector for BRAKER. BMC Bioinformatics 22, 566 (2021).
Kriventseva, E. V. et al. OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res. 47, D807–D811 (2019).
The UniProt Consortium. et al. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531 (2023).
Yates, A. D. et al. Ensembl 2020. Nucleic Acids Res. gkz966 https://doi.org/10.1093/nar/gkz966 (2019). 1.
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).
Pertea, G. & Pertea, M. GFF Utilities: GffRead and GffCompare. F1000Research 9, ISCB Comm J-304 (2020).
Quinlan, A. R. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr. Protoc. Bioinforma. 47, 11.12.1–34 (2014).
Dainat, J. AGAT: Another Gff Analysis Toolkit to handle annotations in any GTF/GFF format (Version 1.4.2). Zenodo https://doi.org/10.5281/zenodo.3552717. 1.
Gobius niger genome assembly fGobNig1.1. NCBI https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_951799975.1/. 1.
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).
Bandi, V. & Gutwin C. Interactive Exploration of Genomic Conservation. In Proceedings of the 46th Graphics Interface Conference on Proceedings of Graphics Interface 2020 (GI’20). Can. Hum.-Comput. Commun. Soc. Waterloo CAN. 1.
SRP606519 - SRA - NCBI. https://identifiers.org/ncbi/insdc.sra:SRP606519. 1.
Gobius paganellus protein-coding gene annotation. figshare https://doi.org/10.6084/m9.figshare.29820362.v1 (2025).
Acknowledgements
This research project was funded by the European Union, NextGenerationEU, under the National Recovery and Resilience Plan (NRRP) Mission 4 Component 2 Investment 1.4, Project title ‘National Biodiversity Future Center (NBFC)’ (project code CN_00000033), CUP J83C22000860007.
Author information
Authors and Affiliations
Department of Ecological and Biological Sciences, University of Tuscia, Viale dell’Università s.n.c., Viterbo, Italy
Paolo Franchini, Giulia Gentile, Amando Macali & Daniele Canestrelli 1.
Science for Life Laboratory, National Bioinformatics Infrastructure Sweden (NBIS), Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
Martin Pippel
Authors
- Paolo Franchini
- Giulia Gentile
- Martin Pippel
- Amando Macali
- Daniele Canestrelli
Contributions
P.F. conceived the study, performed data analysis, and wrote the first draft of the manuscript. G.G. conducted field and laboratory work and performed data analysis. M.P. performed data analysis. A.M. conducted field work. D.C. conceived the study. All authors contributed to writing the manuscript.
Corresponding author
Correspondence to Daniele Canestrelli.
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Franchini, P., Gentile, G., Pippel, M. et al. Highly contiguous chromosome-level assembly of the rock goby (Gobius paganellus) genome. Sci Data (2026). https://doi.org/10.1038/s41597-026-06659-9
Received: 06 August 2025
Accepted: 19 January 2026
Published: 29 January 2026
DOI: https://doi.org/10.1038/s41597-026-06659-9