Background & Summary
Marine mussels, which belong to the phylum Mollusca, inhabit most immersed surfaces of substrata and play a pivotal role in marine ecosystems. They are commonly known as ecosystem engineers because of their activities, particularly filter feeding characteristics, ability to form mussel beds and role in nutrient cycling, etc, which create an impact on water quality, biodiversity and sediment stability under water. They contribute species richness to the rocky littoral community[1](https://www.nature.com/articles/s41597-025-06028-y#ref-CR1 “Borthagaray, A. I. & Carranza, A. Mussels as ecosystem engineers: Their contribution to species richness in a rocky littoral community. Acta Oecologica 31(3), 243–250, https://doi.org/10.1016/j.actao.2006.10.008
(2007).“). Mus…
Background & Summary
Marine mussels, which belong to the phylum Mollusca, inhabit most immersed surfaces of substrata and play a pivotal role in marine ecosystems. They are commonly known as ecosystem engineers because of their activities, particularly filter feeding characteristics, ability to form mussel beds and role in nutrient cycling, etc, which create an impact on water quality, biodiversity and sediment stability under water. They contribute species richness to the rocky littoral community[1](https://www.nature.com/articles/s41597-025-06028-y#ref-CR1 “Borthagaray, A. I. & Carranza, A. Mussels as ecosystem engineers: Their contribution to species richness in a rocky littoral community. Acta Oecologica 31(3), 243–250, https://doi.org/10.1016/j.actao.2006.10.008
(2007).“). Mussels are a nutritious and sustainable food resource that contributes over 8% of global mollusc aquaculture production and holds significant economic value in fisheries and aquaculture2. Simultaneously, mussels are recognised as typical macrofouling organisms, leading to considerable economic and ecological repercussions for the maritime and aquaculture sector[3](#ref-CR3 “Yang, J.-L. et al. The effect of carbon nanotubes and titanium dioxide incorporated in PDMS on biofilm community composition and subsequent mussel plantigrade settlement. Biofouling 32(7), 763–777, https://doi.org/10.1080/08927014.2016.1197210
(2016).“),[4](#ref-CR4 “Yang, J.-L. et al. Larval settlement and metamorphosis of the mussel Mytilus coruscus in response to monospecific bacterial biofilms. Biofouling 29(3), 247–259, https://doi.org/10.1080/08927014.2013.764412
(2013).“),5. Mussels have been utilized as model organisms in various fields, include climate change adaptation, biomonitoring, integrative ecomechanics, biomaterials, ecology, metamorphosis and settlement of larval stages, bioadhesion under water as well as biofouling and antifouling studies[6](#ref-CR6 “O’Donnell, M. J., George, M. N. & Carrington, E. Mussel byssus attachment weakened by ocean acidification. Nature Climate Change 3(6), 587–590, https://doi.org/10.1038/nclimate1846
(2013).“),7,[8](#ref-CR8 “Thomsen, J. et al. Naturally acidified habitat selects for ocean acidification–tolerant mussels. Science Advances 3(4), e1602411, https://doi.org/10.1126/sciadv.1602411
(2017).“),9. Despite their biological, ecological, and economic importance, genomic-level studies in mussels remain inadequate[10](https://www.nature.com/articles/s41597-025-06028-y#ref-CR10 “Murgarella, M. et al. A First Insight into the Genome of the Filter-Feeder Mussel Mytilus galloprovincialis. PLOS ONE 11(3), e0151561, https://doi.org/10.1371/journal.pone.0151561
(2016).“). This knowledge gap hinders the advancement of understanding molecular mechanisms behind adaptation, evolution, genetic manipulation and bioadhesion processes.
In India, two species of marine mussel, Perna perna (Brown mussel) and P. viridis, are commonly found along the coastal region. Both species belong to the genus Perna (Philipsson 1788), are members of the Mytilidae or true mussel (Mollusca; Bivalvia; Lamellibranchia; Mytiloida; Mytilidae) from tropical, subtropical, warm temperate and cold temperate regions in both hemispheres11,12,[13](https://www.nature.com/articles/s41597-025-06028-y#ref-CR13 “Wood, A. R., Apte, S., MacAvoy, E. S. & Gardner, J. P. A. A molecular phylogeny of the marine mussel genus Perna (Bivalvia: Mytilidae) based on nuclear (ITS1&2) and mitochondrial (COI) DNA sequences. Molecular Phylogenetics and Evolution 44(2), 685–698, https://doi.org/10.1016/j.ympev.2006.12.019
(2007).“). These mussel exhibits wide ranges of ecological preferences, occurring in intertidal and shallow subtidal zones, including rocky shore, mangrove ecosystem and estuarine[14](https://www.nature.com/articles/s41597-025-06028-y#ref-CR14 “Hicks, D. et al. Population dynamics of the nonindigenous brown mussel Perna perna in the Gulf of Mexico compared to other world-wide populations. Marine Ecology Progress Series 211, 181–192, https://doi.org/10.3354/meps211181
(2001).“). P. viridis enjoys wider distribution along the Indian coast and is native to the Indo-Pacific region[13](https://www.nature.com/articles/s41597-025-06028-y#ref-CR13 “Wood, A. R., Apte, S., MacAvoy, E. S. & Gardner, J. P. A. A molecular phylogeny of the marine mussel genus Perna (Bivalvia: Mytilidae) based on nuclear (ITS1&2) and mitochondrial (COI) DNA sequences. Molecular Phylogenetics and Evolution 44(2), 685–698, https://doi.org/10.1016/j.ympev.2006.12.019
(2007).“). In contrast, P. perna is native to tropical and subtropical region of the Atlantic Ocean and Western Indian Ocean11,[13](https://www.nature.com/articles/s41597-025-06028-y#ref-CR13 “Wood, A. R., Apte, S., MacAvoy, E. S. & Gardner, J. P. A. A molecular phylogeny of the marine mussel genus Perna (Bivalvia: Mytilidae) based on nuclear (ITS1&2) and mitochondrial (COI) DNA sequences. Molecular Phylogenetics and Evolution 44(2), 685–698, https://doi.org/10.1016/j.ympev.2006.12.019
(2007).“). P. perna was identified as P. indica15, as the morphological characteristics used to differentiate P. indica from P. viridis are identical. Two comprehensive reviews of the genus Perna11,[13](https://www.nature.com/articles/s41597-025-06028-y#ref-CR13 “Wood, A. R., Apte, S., MacAvoy, E. S. & Gardner, J. P. A. A molecular phylogeny of the marine mussel genus Perna (Bivalvia: Mytilidae) based on nuclear (ITS1&2) and mitochondrial (COI) DNA sequences. Molecular Phylogenetics and Evolution 44(2), 685–698, https://doi.org/10.1016/j.ympev.2006.12.019
(2007).“), conclude that P. canaliculus, P. perna and P. viridis are the only three valid species within the genus. A series of research suggests that P. indica is not a distinct species, but an introduced population of P. perna, and evidence indicates that it is native to the Oman region and was introduced to southern India over 100 years ago[16](https://www.nature.com/articles/s41597-025-06028-y#ref-CR16 “Gardner, J. P. A., Patterson, J., George, S. & Edward, J. K. P. Combined evidence indicates that Perna indica Kuriakose and Nair 1976 is Perna perna (Linnaeus, 1758) from the Oman region introduced into southern India more than 100 years ago. Biological Invasions 18(5), 1375–1390, https://doi.org/10.1007/s10530-016-1074-9
(2016).“).
In southern India, P. perna is distributed along the southwest coast, from Varkalai (near Quilon) to Cape Comorin (Kanyakumari), and the southeast coast from Kanyakumari to Tiruchendur15,17,18. P. perna and P. viridis coexist in the Kanyakumari District along the southeast and west coast of southern India, P. perna is the dominant one19. Regardless of its native or invasive status, P. perna now sustains a significant local fishery, with mussels collected from intertidal and shallow subtidal rocky reefs, similar to the established role of P. viridis as a key protein source20,21,22. Also, the report suggests that the introduction of P. perna poses significant ecological threats to P. viridis in India, leading to competition, habitat restriction, and potentially a reduction in the native mussel population. Despite the ecological and economic importance of P. perna, genetic studies on this species remain limited. It is necessary to understand their genetic makeup to focus research on how they adapt to different environmental conditions, their role in the marine habitat, and their interaction with other species in the ecosystem.
Next-generation sequencing (NGS) generates more reliable data that enables researchers to address complex biological questions. These massively parallel sequencing techniques are particularly valuable for studying non-model species or those with limited genetic information. RNA-Seq or sequencing of the transcriptome is one of the most important applications of NGS, and it is a powerful tool for investigating transcriptomes, gene expression profiling, etc. RNA-Seq not only enhances our understanding of gene function and genome structure, also reveals phylogenetic relationships[23](https://www.nature.com/articles/s41597-025-06028-y#ref-CR23 “Garg, R. & Jain, M. RNA-Seq for transcriptome analysis in non-model plants, in: Legume Genomics, Springer, pp. 43–58, https://doi.org/10.1007/978-1-62703-613-9_4
. (2013).“),[24](https://www.nature.com/articles/s41597-025-06028-y#ref-CR24 “Da Fonseca, R. R. et al. Next-generation biology: Sequencing and data analysis approaches for non-model organisms. Marine Genomics 30, 3–13, https://doi.org/10.1016/j.margen.2016.04.012
(2016).“). Over the past few years, RNA-Seq techniques have become widely employed in marine mussel research, facilitating more profound understanding of their molecular response to environmental stressors[25](https://www.nature.com/articles/s41597-025-06028-y#ref-CR25 “Moreira, R. et al. RNA-Seq in Mytilus galloprovincialis: Comparative transcriptomics and expression profiles among different tissues. BMC Genomics 16(1), 728, https://doi.org/10.1186/s12864-015-1817-5
(2015).“),[26](https://www.nature.com/articles/s41597-025-06028-y#ref-CR26 “Xu, M. et al. Transcriptome response to copper heavy metal stress in hard-shelled mussel (Mytilus coruscus). Genomics Data 7, 152–154, https://doi.org/10.1016/j.gdata.2015.12.010
(2016).“), diseases[27](https://www.nature.com/articles/s41597-025-06028-y#ref-CR27 “Moreira, R. et al. Bivalve transcriptomics reveal pathogen sequences and a powerful immune response of the Mediterranean mussel (Mytilus galloprovincialis). Marine Biology 165(4), 61, https://doi.org/10.1007/s00227-018-3308-0
(2018).“) and ecological factors[28](https://www.nature.com/articles/s41597-025-06028-y#ref-CR28 “Lockwood, B. L. & Somero, G. N. Transcriptomic responses to salinity stress in invasive and native blue mussels (genus Mytilus). Molecular Ecology 20(3), 517–529, https://doi.org/10.1111/j.1365-294X.2010.04973.x
(2011).“). Studies on the genus Perna became more prevalent in the mid-2000s, with most of the studies restricted to P. viridis and P. canaliculus. P. viridis has been studied extensively, such as their response to heat stress, pollution and pathogen interactions[29](#ref-CR29 “Cheng, M. C. F., Sarà, G. & Williams, G. A. Combined effects of thermal conditions and food availability on thermal tolerance of the marine bivalve, Perna viridis. Journal of Thermal Biology 78, 270–276, https://doi.org/10.1016/j.jtherbio.2018.10.014
(2018).“),[30](#ref-CR30 “Jiang, X. et al. Transcriptomic responses of Perna viridis embryo to Benzo(a)pyrene exposure elucidated by RNA sequencing. Chemosphere 163, 125–132, https://doi.org/10.1016/j.chemosphere.2016.07.091
(2016).“),[31](https://www.nature.com/articles/s41597-025-06028-y#ref-CR31 “Azizan, A. Venter, L. & Alfaro, A. C. A review on green-lipped mussel, Perna canaliculus immunology: The drivers, virulence factors, advances, and applications. New Zealand Journal of Marine and Freshwater Research 58(3), 319–363, https://doi.org/10.1080/00288330.2023.2269865
(2024).“), while research on P. canaliculus studies has been chiefly based on immune gene response and effect of environmental factors such as ocean acidification[32](https://www.nature.com/articles/s41597-025-06028-y#ref-CR32 “Ericson, J. A. et al. Differential responses of selectively bred mussels (Perna canaliculus) to heat stress—survival, immunology, gene expression and microbiome diversity. Frontiers in Physiology 14, 1265879, https://doi.org/10.3389/fphys.2023.1265879
(2024).“). However, some studies highlight the fact that underrepresentation P. perna in genomic and transcriptomic research when compared to other members of the same genus.
In India, like P. viridis, P. perna is a vital food resource and livelihood provider for coastal communities, significantly contributing to economic, ecological and cultural well-being. Therefore, understanding its habitat restrictions, reduction of shell size and meat yield occurring over time is crucial for ensuring the sustainable use of this species[33](https://www.nature.com/articles/s41597-025-06028-y#ref-CR33 “Lockwood, B. L., Connor, K. M. & Gracey, A. Y. The environmentally tuned transcriptomes of Mytilus mussels. Journal of Experimental Biology 218(12), 1822–1833, https://doi.org/10.1242/jeb.118190
(2015).“). Additionally, environmental stressors from anthropogenic activities are affecting marine mussels at the genomic level. Lockwood et al.[33](https://www.nature.com/articles/s41597-025-06028-y#ref-CR33 “Lockwood, B. L., Connor, K. M. & Gracey, A. Y. The environmentally tuned transcriptomes of Mytilus mussels. Journal of Experimental Biology 218(12), 1822–1833, https://doi.org/10.1242/jeb.118190
(2015).“) state that the role of ocean acidification, temperature and food availability in affecting mussel growth and survival, and emphasise how transcriptomics can be used to study these impacts. By using RNA-Seq, Barrett et al.[34](https://www.nature.com/articles/s41597-025-06028-y#ref-CR34 “Barrett, N. J. et al. Molecular Responses to Thermal and Osmotic Stress in Arctic Intertidal Mussels (Mytilus edulis): The Limits of Resilience. Genes 13(1), 1, https://doi.org/10.3390/genes13010155
(2022).“) provide the molecular basis of growth changes with respect to temperature and acidification stress in Mytilus edulis and P. viridis. Granger et al.[35](https://www.nature.com/articles/s41597-025-06028-y#ref-CR35 “Granger Joly De-Boissel, P. et al. Functional and molecular responses of the blue mussel Mytilus edulis’ hemocytes exposed to cadmium—An in vitro model and transcriptomic approach. Fish & Shellfish Immunology 67, 575–585, https://doi.org/10.1016/j.fsi.2017.06.001
(2017).“) explain that transcriptomic data can help to elucidate genes involved in detoxification in M. edulis while exposed to cadmium.
Currently, there are no research on the transcriptomic analysis of Asian brown mussel, P. perna L. Our study will unravel the transcriptomic structure of P. perna for the first time via de novo transcriptome assembly. We believe that studies like this will help us understand the growth, survival, and well-being of Asian brown mussel at the transcriptomic, genomic, and proteomic levels.
Methods
Sample collection
The live brown mussel, Perna perna L. (Fig. 1), was collected from natural mussel beds off Vizhinjam (8°22′25′′N 76°55′17′′E) open sea, Thiruvananthapuram, Kerala, India. The environmental conditions in Vizhinjam sea plays an important role in the growth of Asian brown mussel. During the collection of samples, the temperature was 26.74°C, salinity is 32.35 ppt and pH of the sea water was 7.6. A total of 45 individual mussels were collected and divided in to three replicas; each replica contains 15 individual mussels. The mussel foot was dissected out and quickly quick-frozen in liquid nitrogen, then transferred to the laboratory and stored at −80 °C for further transcriptomic analysis.
Fig. 1
Asian brown mussel, P. perna L. (a) morphology of P. perna; (b) P. perna with mussel foot and byssus thread.
RNA extraction, library preparation and RNA-Sequencing
RNA was isolated from the mussel foot using TRIzol reagent (Invitrogen, USA), following the standard protocol for RNA extraction. Subsequently, RNA was purified with column purification using Quick RNA Miniprep Plus Kit (Zymo Research, USA) according to the manufacturer’s instructions. The RNA purity and quantity were assessed using a NanoDrop spectrophotometer (Thermo Fisher Scientific, USA), and RNA integrity was checked using the Agilent 2200 TapeStation with High Sensitivity RNA ScreenTape (Agilent Technologies, USA). RNA samples with an OD 260/280 ≥ 1.8, RNA integrity number ≥ 8 were selected for further analysis. In addition to that 1.5% denaturing agarose gel electrophoresis were conducted to understand the quality and quantity of RNA. The single distinct 18S peak without any smearing for all extracted replicate group considered as good quality sample. The QC passed RNA samples were selected for further analysis. cDNA library preparation was done with TruSeq Standard mRNA-seq library preparation kit following the manufacturer’s instruction (Illumina, USA). Quality of library was assessed using the Agilent 2100 Bioanalyzer (Part No. G2939BA), and concentration was measured using KAPA library quantification kit (Cat. No. KK4824). Sequencing was performed by the Illumina platform using 2 × 150 bp chemistry, NovaSeq. 6000 System platform.
Data processing and de novo assembly
The sequenced raw data were processed to generate high-quality, concordant reads using Trimmomatic v0.3936, followed by an in-house script to remove adapter sequences, ambiguous reads (those containing >5% unknown nucleotides, ‘N’), and low-purity sequences (those with >10% of bases having a Phred score < 25). The resulting high-quality paired-end reads with Phred score >25 was then used for de novo assembly. Then, such reads were assembled into transcripts using the Trinity de novo assembler (version 2.8.4)[37](https://www.nature.com/articles/s41597-025-06028-y#ref-CR37 “Trinity: Trinity RNA-Seq assembler performance optimization, Henschel, R. et al XSEDE Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond. ISBN: 978-1-4503-1602-6, https://doi.org/10.1145/2335755.2335842
(2012).“) with a k-mer size 25. To minimize isoform redundancy, the assembled transcripts were clustered using CD-HIT-EST v4.6, is to generate varied gene sequences by eliminating duplicated and faulty transcripts that often arise in de novo assembly. The resulting non-extendable sequences are unigenes. Unigenes with >85% coverage at ≥3x read depth were retained for further analysis.
Identification of coding sequences (CDS prediction)
Coding sequence predictions of unigenes were done by identifying Open reading frames (ORF) using a tool like TransDecoder—v5.3.0[38](https://www.nature.com/articles/s41597-025-06028-y#ref-CR38 “TransDecoder: http://transdecoder.github.io/
“). This software identifies CDS based on a minimum length ORF found in a unigene, a log-likelihood score similar to GeneID is >0, and the reports longer ORF when one ORF is fully within another.
Functional annotation and categorization of assembly
Functional annotation of the identified CDS was carried out using the DIAMOND39 program in BLASTX mode; it helps to find homologous sequences for the genes against the NR (non-redundant protein database) from NCBI (https://www.ncbi.nlm.nih.gov/refseq/about/nonredundantproteins/). Gene ontology analysis was performed using the Blast2GO program using OmicsBox[40](https://www.nature.com/articles/s41597-025-06028-y#ref-CR40 “OmicsBox: https://www.biobam.com/omicsbox/
.“). The identified genes were grouped into three main domains: Biological process (BP), Molecular component (MP) and Cellular component (CC). GO mapping was performed to retrieve GO terms for functionally annotated CDS. BlastX (https://blast.ncbi.nlm.nih.gov/Blast.cgi?LINK_LOC=blasthome&PAGE_TYPE=BlastSearch&PROGRAM=blastx) result accession IDs are used to gather gene names or symbols, which are then searched within the species-specific entries of the Gene Ontology database for biological information, and the same IDs are used to retrieve UniProt IDs from various protein databases like Uniprot (https://www.uniprot.org/), SwissProt (https://www.sib.swiss/swiss-prot), TrEMBL (https://www.uniprot.org/), GenPept (https://www.ncbi.nlm.nih.gov/protein/CAA71118.1?report=genpept), PDB (https://www.rcsb.org/) and RefSeq (https://www.ncbi.nlm.nih.gov/refseq/). By use of KEGG Automated Annotation Server (KAAS), KEGG analysis was carried out; output includes KEGG Orthology (KO) assignments, associated Enzyme Commission (EC) numbers, and the metabolic pathways of predicted CDS[41](https://www.nature.com/articles/s41597-025-06028-y#ref-CR41 “KAAS: http://www.genome.jp/kaas-bin/kaas_main
, Moriya, Y. et al. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic acids research 35. suppl 2, W182-W185. (2007).“).
Data Records
The all the raw and high-quality data has been deposited in the NCBI Sequence Read Archive (SRA)[42](https://www.nature.com/articles/s41597-025-06028-y#ref-CR42 “NCBI Sequence Read Archive https://identifiers.org/insdc.sra:SRP603646
(2025).“) under BioProject PRJNA1295345, BioSample accession number SAMN50154628. The curated final transcriptome assembly has been deposited to NCBI GenBank with accession number GLIV00000000[43](https://www.nature.com/articles/s41597-025-06028-y#ref-CR43 “Mehabooba, B., Anand, P. P. & Vardhanan, Y. S. GenBank https://identifiers.org/ncbi/insdc:GLIV00000000.1
(2025).“). The high-quality sequence data, raw data, processed data, and data related to GO, functional analysis and pathway analysis were available on Figshare[44](https://www.nature.com/articles/s41597-025-06028-y#ref-CR44 “Mehabooba, B., Anand, P. P. & Vardhanan, Y. S. De novo transcriptomic analysis of Perna perna (Bivalve: Mytilidae). Figshare https://doi.org/10.6084/m9.figshare.29149526.v1
(2025).“) (https://doi.org/10.6084/m9.figshare.29149526.v1).
Technical Validation
De novo assembly
Sequencing was performed by the Illumina platform using 2 × 150 bp chemistry and produced approximately 35.31 million reads. After filtering, 31.72 million high-quality paired-end reads were obtained with Phred score > 25 and 89.83% of paired-end reads were retained (Table 1).
The limited studies on genomic data for P. perna have always been challenging for researchers. The development of its reference transcriptome provides a platform for the species’ biotechnological advancement. Previous research on mussels has investigated multiple organs such as the foot, mantle, gill, adductor muscle, and hepatopancreas[45](https://www.nature.com/articles/s41597-025-06028-y#ref-CR45 “Leung, P. T. et al. De novo transcriptome analysis of Perna viridis highlights tissue-specific patterns for environmental studies. BMC Genomics, 15(1), 804, https://doi.org/10.1186/1471-2164-15-804
(2014).“),[46](https://www.nature.com/articles/s41597-025-06028-y#ref-CR46 “Uliano-Silva, M. et al. Gene Discovery through Transcriptome Sequencing for the Invasive Mussel Limnoperna fortunei. PLoS ONE 9(7), e102973, https://doi.org/10.1371/journal.pone.0102973
(2014). (2014).“). But this study exclusively focused on the foot of P. perna. The foot exhibits distinct gene expression patterns compared to internal organs and offers a more targeted approach for RNA extraction, makes it ideal for study mechanical stress, environmental resilience, and immune response, forming a solid foundation for understanding tissue-specific adaptations to environmental challenges[47](https://www.nature.com/articles/s41597-025-06028-y#ref-CR47 “Feidantsis, K. et al. Correlation between intermediary metabolism, Hsp gene expression, and oxidative stress-related proteins in long-term thermal-stressed Mytilus galloprovincialis. American Journal of Physiology-Regulatory, Integrative and Comparative Physiology 319(3), R264–R281, https://doi.org/10.1152/ajpregu.00066.2020
(2020).“),[48](https://www.nature.com/articles/s41597-025-06028-y#ref-CR48 “Xiong, X. et al. Effect of low-salt on the survival of mussel Mytilus coruscus and its molecular responses to chronic prolonged low-salt stress. Aquaculture 585, 740689, https://doi.org/10.1016/j.aquaculture.2024.740689
(2024).“).
In this study, a reference transcriptome was constructed using a de novo assembly approach. 87,872 transcripts were successfully produced, with an overall length of 119,541,653 bp. Additionally, the N50 value is 2,056 bp, which indicates the quality and completeness of transcriptome assembly, implying that 50% of the total assembled bases are found in contigs of at least 2,056 bp, reflecting a relatively high-quality assembly. The length of the longest transcript was 14,507 bp, whereas the length of the shortest transcript was 301 bp. After removal of isoforms produced during assembly, 33,567 numbers of unigenes were produced with an average length of 1,614 bp. The unigenes with an N50 value of 2,247 shows the excellent quality of assembly, and the total length of the unigenes was 54,166,516 bp. A total of 18,951 CDS were identified in unigenes with a mean length of 1,118 bp (Table 2).
The assessment of the assembly was based on assembly statistics, including the total number of transcripts generated, N50 and their mean length[49](https://www.nature.com/articles/s41597-025-06028-y#ref-CR49 “Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology 29(7), 644–652, https://doi.org/10.1038/nbt.1883
(2011).“). For M. strigata foot, 17.35 million paired-end reads were obtained with a Phred score > 20 and a mean read length of 150 bp. The assembly yielded 60,362 transcripts with a mean length of 1,146.19 bp and N50 of 1,578 from various tissue samples. The paired-end read count, Phred score, mean length, transcripts number, transcript average length and N50 from foot tissue alone in this study were all higher than those reported for M. strigata[50](https://www.nature.com/articles/s41597-025-06028-y#ref-CR50 “Vysakh, V. G., Sukumaran, S., Sebastian, W. & Gopalakrishnan, A. The transcriptomic footprint of Mytella strigata: de novo transcriptome assembly of a major invasive species. Scientific Data 12(1), 201, https://doi.org/10.1038/s41597-025-04559-y
(2025).“). Notably, this study’s transcripts were significantly higher than for other non-reference bivalve species subjected to NGS technology. There were 14,240 transcripts reported from Yesso scallop, Mizuhopecten yessoensis[51](https://www.nature.com/articles/s41597-025-06028-y#ref-CR51 “Meng, X. et al. De Novo Characterization of Japanese Scallop Mizuhopecten yessoensis Transcriptome and Analysis of Its Gene Expression following Cadmium Exposure. PLoS ONE 8(5), e64485, https://doi.org/10.1371/journal.pone.0064485
(2013).“), 22,023 transcripts from deep-sea hydrothermal vent mussel, Bathymodiolus azoricus[52](https://www.nature.com/articles/s41597-025-06028-y#ref-CR52 “Bettencourt, R. et al. High-throughput sequencing and analysis of the gill tissue transcriptome from the deep-sea hydrothermal vent mussel Bathymodiolus azoricus. BMC Genomics 11(1), 559, https://doi.org/10.1186/1471-2164-11-559
(2010).“), 9,747 transcripts from Manila clam, Ruditapes philippinarum[53](https://www.nature.com/articles/s41597-025-06028-y#ref-CR53 “Milan, M. et al. Transcriptome sequencing and microarray development for the Manila clam, Ruditapes philippinarum: Genomic tools for environmental monitoring. BMC Genomics 12(1), 234, https://doi.org/10.1186/1471-2164-12-234
(2011).“), 18,196 contigs in the striped venus Chamelea gallina[54](https://www.nature.com/articles/s41597-025-06028-y#ref-CR54 “Coppe, A. et al. Sequencing and Characterization of Striped Venus Transcriptome Expand Resources for Clam Fishery Genetics. PLoS ONE 7(9), e44185, https://doi.org/10.1371/journal.pone.0044185
(2012).“), 1,023 contigs in the Mediterranean mussel, M. galloprovincialis[55](https://www.nature.com/articles/s41597-025-06028-y#ref-CR55 “Craft, J. A. et al. Pyrosequencing of Mytilus galloprovincialis cDNAs: Tissue-Specific Expression Patterns. PLoS ONE 5(1), e8875, https://doi.org/10.1371/journal.pone.0008875
(2010).“). The improved annotation of the assembled transcripts of P. perna could be explained by the increased sequence coverage through better base quality score (Phred score), contig length, N50, transcript number, etc. and the substantial annotated genomic data available from the newly sequenced genome of M. strigata[50](https://www.nature.com/articles/s41597-025-06028-y#ref-CR50 “Vysakh, V. G., Sukumaran, S., Sebastian, W. & Gopalakrishnan, A. The transcriptomic footprint of Mytella strigata: de novo transcriptome assembly of a major invasive species. Scientific Data 12(1), 201, https://doi.org/10.1038/s41597-025-04559-y
(2025).“).
Functional annotation and classification of transcriptome
The de novo assembly was aligned using the BLASTX algorithm to the NCBI Nr protein database to identify species exhibiting the highest number of significant blast-hits (Fig. 2). In total, 18,951 CDS were annotated, with 84.0% (15,928) exhibiting significant Blast hits, while 12.4% (2,357) were without corresponding matches in the database. According to the result of Nr annotation, the species with Blast top hits were M. coruscus (5,479: 28.91% of CDS), M. galloprovincialis (5,324: 28.01% of CDS) and M. edulis (3,516: 18.55% of CDS). Blast result revealed that 28.66% of the sequence exhibited > 90% sequence similarity, while 99.24% showed similarity exceeding 50%, indicating a high degree of homology across the dataset. A total of 7,635 CDS had a length of > 1000 bp. In the analysis, the query sequence, with lengths of 1356 bp and 1338 bp, showed 100% sequence similarity and an E-value of 0 with the TUBA and TUBB genes from M. coruscus and M. edulis, respectively.
Fig. 2
Species distribution in top blast-hits of P. perna L. de novo transcriptome.
The annotation results from the BLAST search of de novo transcriptome for non-model species were found to be highly influenced by the availability of annotated sequence data from a reference genome[54](https://www.nature.com/articles/s41597-025-06028-y#ref-CR54 “Coppe, A. et al. Sequencing and Characterization of Striped Venus Transcriptome Expand Resources for Clam Fishery Genetics. PLoS ONE 7(9), e44185, https://doi.org/10.1371/journal.pone.0044185
(2012).“). BLAST search results of P. perna revealed most significant homology with three species of mytilidae, namely M. coruscus, M. galloprovincialis and M. edulis, suggesting a high degree of evolutionary conservation and taxonomic proximity with the Mytilus lineage. A previous study of brown mussels from the southernmost part of the Indian coast, identified as P. indica and claimed to be a new species, showed top blast hits with Mizuhopecten yessoensis, a member of the order Pectinida, Crassostrea virginica and Crassostrea gigas, members of the order Ostreida11,[13](https://www.nature.com/articles/s41597-025-06028-y#ref-CR13 “Wood, A. R., Apte, S., MacAvoy, E. S. & Gardner, J. P. A. A molecular phylogeny of the marine mussel genus Perna (Bivalvia: Mytilidae) based on nuclear (ITS1&2) and mitochondrial (COI) DNA sequences. Molecular Phylogenetics and Evolution 44(2), 685–698, https://doi.org/10.1016/j.ympev.2006.12.019
(2007).“),[16](https://www.nature.com/articles/s41597-025-06028-y#ref-CR16 “Gardner, J. P. A., Patterson, J., George, S. & Edward, J. K. P. Combined evidence indicates that Perna indica Kuriakose and Nair 1976 is Perna perna (Linnaeus, 1758) from the Oman region introduced into southern India more than 100 years ago. Biological Invasions 18(5), 1375–1390, https://doi.org/10.1007/s10530-016-1074-9
(2016).“),56. The BLAST search of P. viridis reveals its most remarkable sequences match those from the Pacific oyster, Crassostrea gigas and 23,392 unigenes were annotated by the NCBI nr[45](https://www.nature.com/articles/s41597-025-06028-y#ref-CR45 “Leung, P. T. et al. De novo transcriptome analysis of Perna viridis highlights tissue-specific patterns for environmental studies. BMC Genomics, 15(1), 804, https://doi.org/10.1186/1471-2164-15-804
(2014).“). Contigs were utilized for BLAST annotation in earlier research on Ruditapes philippinarum (Manila clam)[53](https://www.nature.com/articles/s41597-025-06028-y#ref-CR53 “Milan, M. et al. Transcriptome sequencing and microarray development for the Manila clam, Ruditapes philippinarum: Genomic tools for environmental monitoring. BMC Genomics 12(1), 234, https://doi.org/10.1186/1471-2164-12-234
(2011).“), and on P. viridis[45](https://www.nature.com/articles/s41597-025-06028-y#ref-CR45 “Leung, P. T. et al. De novo transcriptome analysis of Perna viridis highlights tissue-specific patterns for environmental studies. BMC Genomics, 15(1), 804, https://doi.org/10.1186/1471-2164-15-804
(2014).“), BLAST was used to annotate both transcripts and unigenes, which increased sequence counts but also created redundancy and decreased annotation accuracy. Gene content and functional annotations are more accurately represented by unigenes, distinct, non-redundant sequences. Our unigene-based method guarantees more accurate and dependable functional annotations while reducing redundancy. In a transcriptomic study on Crassostrea gigas[57](https://www.nature.com/articles/s41597-025-06028-y#ref-CR57 “Zhao, X., Yu, H., Kong, L. & Li, Q. Transcriptomic Responses to Salinity Stress in the Pacific Oyster Crassostrea gigas. PLoS ONE 7(9), e46244, https://doi.org/10.1371/journal.pone.0046244
(2012).“) in response to salinity stress, protein-coding sequences were specifically used to identify homologous genes. Similarly, we used 18,951 CDS numbers for this purpose, which offers higher accuracy in gene function prediction by providing for more precise identification of protein domains and motifs. Additionally, CDS sequences construct meaningful homology hits in BLAST searches, making them more reliable for functional annotation.
Our BLAST result revealed that the two genes TUBA and TUBB exhibited 100% sequence similarity and an E-value of 0 with the genes from M. coruscus and M. edulis, respectively, indicating a highly significant homology. These genes were also reported in closely related species like P. viridis[58](https://www.nature.com/articles/s41597-025-06028-y#ref-CR58 “Dou, M. et al. De novo transcriptome analysis of the mussel Perna viridis after exposure to the toxic dinoflagellate Prorocentrum lima. Ecotoxicology and Environmental Safety 192, 110265, https://doi.org/10.1016/j.ecoenv.2020.110265
(2020).“), M. chilensis[59](https://www.nature.com/articles/s41597-025-06028-y#ref-CR59 “Núñez-Acuña, G., Tapia, F. J. Haye, P. A., & Gallardo-Escárate, C. Gene expression analysis in Mytilus chilensis populations reveals local patterns associated with ocean environmental conditions. Journal of Experimental Marine Biology and Ecology, 420–421, 56–64, https://doi.org/10.1016/j.jembe.2012.03.024
. (2012).“) and M. californianus[60](https://www.nature.com/articles/s41597-025-06028-y#ref-CR60 “Elowe, C. & Tomanek, L. Circadian and circatidal rhythms of protein abundance in the California mussel (Mytilus californianus). Molecular Ecology 30(20), 5151–5163, https://doi.org/10.1111/mec.16122
(2021).“) as well as in other bivalve species such as Crassostrea gigas[57](https://www.nature.com/articles/s41597-025-06028-y#ref-CR57 “Zhao, X., Yu, H., Kong, L. & Li, Q. Transcriptomic Responses to Salinity Stress in the Pacific Oyster Crassostrea gigas. PLoS ONE 7(9), e46244, https://doi.org/10.1371/journal.pone.0046244
(2012).“) and Pictada fusicata[61](https://www.nature.com/articles/s41597-025-06028-y#ref-CR61 “Sun, J. et al. Transcriptome analysis of the mantle tissue of Pinctada fucata with red and black shells under salinity stress. Gene 823, 146367, https://doi.org/10.1016/j.gene.2022.146367
(2022).“), showing their conserved nature across a wide range of bivalves. This states that TUBA and TUBB are evolutionarily conserved for microtubule dynamics. These protein component of tubulin, and these heterodimeric proteins plays an important functional organization of microtubules[57](#ref-CR57 “Zhao, X., Yu, H., Kong, L. & Li, Q. Transcriptomic Responses to Salinity Stress in the Pacific Oyster Crassostrea gigas. PLoS ONE 7(9), e46244, https://doi.org/10.1371/journal.pone.0046244
(2012).“),[58](#ref-CR58 “Dou, M. et al. De novo transcriptome analysis of the mussel Perna viridis after exposure to the toxic dinoflagellate Prorocentrum lima. Ecotoxicology and Environmental Safety 192, 110265, https://doi.org/10.1016/j.ecoenv.2020.110265
(2020).“),[59](#ref-CR59 “Núñez-Acuña, G., Tapia, F. J. Haye, P. A., & Gallardo-Escárate, C. Gene expression analysis in Mytilus chilensis populations reveals local patterns associated with ocean environmental conditions. Journal of Experimental Marine Biology and Ecology, 420–421, 56–64, https://doi.org/10.1016/j.jembe.2012.03.024
. (2012).“),[60](#ref-CR60 “Elowe, C. & Tomanek, L. Circadian and circatidal rhythms of protein abundance in the California mussel (Mytilus californianus). Molecular Ecology 30(20), 5151–5163, https://doi.org/10.1111/mec.16122
(2021).“),[61](https://www.nature.com/articles/s41597-025-06028-y#ref-CR61 “Sun, J. et al. Transcriptome analysis of the mantle tissue of Pinctada fucata with red and black shells under salinity stress. Gene 823, 146367, https://doi.org/10.1016/j.gene.2022.146367
(2022).“).
Gene Ontology (GO) annotations were conducted to enable the functional classification of the CDS (Fig. 3). A total of 8,511 CDS were annotated and assigned to 24 functional categories based on sequence homology, belonging to three main GO ontologies: biological process (4,705 CDS), cellular component (4,793 CDS) and molecular functions (6,291 CDS). The molecular function category is associated with the most coding sequences. It suggests that most identified unigenes are linked to specific molecular activities, like binding or catalysis, in the cell. Also, 22.5% (2,293) of unigenes were associated with ion binding, indicating their role in ion regulation. Additionally, 21.36% (2,182) were linked to organic cyclic compound binding, suggesting their involvement in key metabolic and regulatory processes. Unigenes were distributed across 13 distinct functional groups in the biological process category. A notable 14.96% of unigenes were allocated to the organic substance metabolic process, highlighting their involvement in biochemical transformation of organic compounds and 14.22% of unigenes were linked to primary metabolic process, involving essential pathways for energy production and biosynthesis. Furthermore, 13.59% of unigenes were classified under cellular metabolic process, and 12.47% belonged to nitrogen compound metabolic process. In the case of the cellular component category, 30% (2,267) of the unigenes are related to intracellular anatomical structure, reflecting their role in supporting fundamental cellular processes and ensuring cellular integrity.
Fig. 3
Gene Ontology (GO) functional classification. CDS were classified in three major classes as (a) molecular functions (MF), (b) cellular component (CC), and (c) biological process (BP).
In order to further asses the function of the assembled unigenes, we performed a search of the annotated unigenes within the Eukaryotic Orthologous Groups (KOG) (Fig. 4). Among the 24 KOG categories, the cluster for unknown function represent the highest proportion (23.69%), followed by signal transduction represent the second largest group (16.03%), post-translational modification, protein turn over, chaperone functions (11.24%), transcription (5.79%), intracellular trafficking and secretion (4.74%), translation (4.11%) and RNA processing and modification (3.96%). The following categories represent the smallest groups: cell motility (0.13%), nuclear structure (0.40%) and coenzyme metabolism (0.85%).
Fig. 4
Distribution of KOG categories. Pie diagram showing KOG category distribution.
To identify the potential involvement of the predicted CDS in biological pathways, all the identified CDS were mapped to reference canonical pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG). The KEGG database delivers a comprehensive analysis of intracellular metabolic pathways and functions of the gene products, offering valuable insight for studying the intricate biological behaviors of genes (Fig. 5). In total, 7,866 CDS were significantly mapped into 33 pathways of the KEGG database. The annotated CDS were further grouped into 5 primary categories: Metabolism (1,685), Genetic information processing (1,273), Environmental information processing (1,139), Cellular processes (1,527) and Organismal systems (2,242). The pathways with the most representation were signal transduction (949 members), transport and catabolism (651 members), endocrine system (478 members) and immune system (414 members). Additionally, different enzyme classes were mapped, with a detailed classification is represented in Fig. 6.
Fig. 5
KEGG Pathway Annotations. Pathway annotations using KEGG display the pathways involved.
Fig. 6
Distribution of enzyme classes in the de novo transcriptomic analysis of P. perna L.
To describe the function of unigenes, functional annotation was performed using GO and KOG. According to the GO annotation, majority of annotated unigenes were related to molecular functions, followed by cellular components and biological processes, showing nearly equal proportions. The majority of unigenes were associated with ion binding and organic cyclic compound binding within the molecular function category. In the biological process category, most unigenes were linked to organic substance metabolic processes. Meanwhile, in the cellular component category, the unigenes were primarily related to intracellular anatomical structures. In the line of previous studies on M. strigata[50](https://www.nature.com/articles/s41597-025-06028-y#ref-CR50 “Vysakh, V. G., Sukumaran, S., Sebastian, W. & Gopalakrishnan, A. The transcriptomic footprint of Mytella strigata: de novo transcriptome assembly of a major invasive species. Scientific Data 12(1), 201, https://doi.org/10.1038/s41597-025-04559-y
(2025).“) and Bathymodiolus platifrons[62](https://www.nature.com/articles/s41597-025-06028-y#ref-CR62 “Wong, Y. H. et al. High-throughput transcriptome sequencing of the cold seep mussel Bathymodiolus platifrons. Scientific Reports 5(1), 16597, https://doi.org/10.1038/srep16597
(2015).“), these results exhibit a distinctive pattern where the biological processes, particularly cellular processes and signaling pathways, are considered across multiple tissues, including the foot. However, unlike those studies, our study focused only on the foot instead of various tissues, which provides a narrower perspective where unigenes related to molecular functions are more prominent. This difference in focus may explain the observed variation in GO composition. Additionally, Leung et al.[45](https://www.nature.com/articles/s41597-025-06028-y#ref-CR45 “Leung, P. T. et al. De novo transcriptome analysis of Perna viridis highlights tissue-specific patterns for environmental studies. BMC Genomics, 15(1), 804, https://doi.org/10.1186/1471-2164-15-804
(2014).“) reported that GO compositions did not vary among different tissues in P. viridis, but gene expression profiles show distinctive tissue-specific patterns. This suggests that while GO compositions may remain consistent, the gene expression varies based on the specific roles of each tissue, including the foot in our study, which could explain the observed emphasis on molecular functions. Similar to M. strigata[50](https://www.nature.com/articles/s41597-025-06028-y#ref-CR50 “Vysakh, V. G., Sukumaran, S., Sebastian, W. & Gopalakrishnan, A. The transcriptomic footprint of Mytella strigata: de novo transcriptome assembly of a major invasive species. Scientific Data 12(1), 201, https://doi.org/10.1038/s41597-025-04559-y
(2025).“), the KOG annotation in P. perna reveals that the largest proportions of clusters are of unknown function. This is followed by a significant portion for signal transduction, and then post-translational modification, protein turn over, chaperone functions. The unknown function category in both species indicates that they might have evolved species-specific mechanisms that have not been explored, and highlights the relevance of studying non-model species like these to unveil novel genes and biological pathways[63](#ref-CR63 “Vysakh, V. G. & Sukumaran, S. De novo transcriptome assembly of the Perna viridis: A novel invertebrate model for ecotoxicological studies. Scientific Data 12(1), 147, https://doi.org/10.1038/s41597-025-04496-w
(2025).“),[64](#ref-CR64 “Gardner, J. P. A., Patterson, J. & Patterson, E. J. K. Phylogeograph