Introduction
In recent decades, the frequency and scale of infectious disease outbreaks have increased, particularly those emerging after spillover of animal viruses1. The emergence and spread of these diseases have been driven by a range of interconnected factors, including urbanization[2](https://www.nature.com/articles/s41467-025-65208-x#ref-CR2 “Neiderud, C. J. How urbanization affects the epidemiology of emerging infectious diseases. Infect. Ecol. Epidemiol. 5, 27060 (2015)…
Introduction
In recent decades, the frequency and scale of infectious disease outbreaks have increased, particularly those emerging after spillover of animal viruses1. The emergence and spread of these diseases have been driven by a range of interconnected factors, including urbanization2, the development of megacities3, deforestation4,5, growing food demands6, and increasing global connectivity. Migration from rural to urban areas has contributed to population growth in cities, resulting in increased close-contact interactions that facilitate disease transmission3. By 2017, 55% of the world’s population lived in urban areas7, making these high-density cities critical hubs for the spread of infectious diseases, and therefore strategic sites for monitoring and responding to both novel and established public health threats.
Current infectious disease surveillance primarily relies on the identification and reporting of symptomatic cases8, which has limitations, particularly for viruses that frequently cause asymptomatic or mild infections, or diseases that are not reportable. As a result, novel infectious diseases caused by emerging viruses or variants of known viruses can evade detection, spreading silently before being recognized. This phenomenon was exemplified by the introduction of Zika virus in South America9 and the early undetected spread of SARS-CoV-2 into Europe10,11 and the USA12. To better understand the prevalence and spread of viral infections, alternative surveillance strategies beyond symptom-based reporting are needed to capture the full spectrum of viral transmission dynamics and enhance early detection and response to emerging infectious disease threats.
Wastewater-based epidemiology (WBE) involves the detection and monitoring of chemicals, drugs and substances13,14, genes15, pathogens16 and vectors17 within wastewater to assess community health or disease risk. Recent detections of poliovirus in wastewater from major urban cities like New York18, London19, and poliovirus essential facilities20 highlight the potential of WBE in identifying circulation hotspots and guiding targeted interventions, such as enhancing surveillance and launching of vaccine campaigns17. During the COVID-19 pandemic, quantitative reverse transcription PCR-based methods were used to assess the viral load in wastewater, with viral concentrations following the number of positive cases at the community level and providing insights into infection dynamics over time19,20. Amplicon-based sequencing was used in parallel to identify circulating viral variants, enhancing early detection capabilities in wastewater if used in combination with pathogen specific bioinformatic pipelines21,22. Utilizing targeted molecular methods has extended WBE to track other pathogenic viruses like influenza23, respiratory syncytial virus23, mpox24, arboviruses25, and non-polio enteroviruses (EV) such as EV-D6826.
Among viruses shed into wastewater, enteric viruses pose a significant public health threat. Enteric viruses mainly transmit through the faecal-oral route, cause a wide range of gastrointestinal diseases and are potential sources of community-wide outbreaks27. Detection is complicated by their extensive diversity, nonspecific clinical symptoms that hamper diagnosis, high rate of secondary transmission which may mask initial introductions, and gaps in surveillance systems. As the number of viral targets of interest grows, the question arises whether surveillance efforts focused on individual pathogens can be combined in a more comprehensive approach. In this context, metagenomic sequencing can be a promising technique that can address this need28. A significant advantage of metagenomic sequencing is its ability to simultaneously detect and characterize multiple viral agents present within a single (wastewater) sample including non-target (novel) viruses29,30.
In this study, we explore the potential of metagenomic sequencing and analysis as a tool for population level monitoring of virus circulation across hosts in high-density urban areas. We sequenced and characterized the wastewater virome of 62 major cities worldwide using both a shotgun viral metagenomic and a capture based sequencing method targeting enteric viruses, to compare viromes and viral dynamics across the world. Our findings reveal city-specific virome fingerprints and show that wastewater metagenomics enables early detection of emerging viruses and can inform broader epidemic surveillance.
Results
Virome data overview
To investigate the worldwide urban wastewater virome, wastewater samples were collected from 62 cities across 47 countries on 6 continents (Fig. 1)31. For all cities, samples were acquired biannually in June and November over a 2 year period spanning 2017−2019 (biannual samples). Additionally, to investigate temporal changes, monthly samples were obtained from eight cities in different countries in six different continents during the same period. Wastewater samples were analysed using both shotgun viral metagenomic sequencing to determine the viral metagenome and sequencing following capture using a custom probe set (GastroCap, Supplementary Fig. 1), designed to enrich for enteric virus sequences. The median number of reads generated after sequencing was 9.8 × 105 (IQR: 6.9 × 105-1.26 × 106) per sample for the metagenomic sequence dataset and 7.4 × 105 (IQR: 4.6 × 105- 1.17 × 106) for the capture sequence dataset (Supplementary Fig. 2). The median percentage of reads per sample that could be annotated as viral was 11% (IQR: 6.5-15.3%) for the metagenomic sequence dataset and 26% (IQR: 11.8-46.9%) for the capture sequence dataset. The remaining reads were annotated as bacterial, human, other eukaryotic, or unclassified (i.e., without any annotation) (Supplementary Fig. 2). Principal component analysis (PCA) of centred log-ratio (CLR) transformed read count data at the superkingdom level revealed distinct clustering between the two datasets and confirmed the enrichment of viral reads using GastroCap (Supplementary Fig. 3).
Fig. 1: Overview of the 62 wastewater collection sites.
Cities are coloured by continent and categorized by sampling strategy: colour-filled dots represent cities with biannual sampling, while outlined dots represent cities with longitudinal sampling.
Composition of the Global Urban Wastewater Virome
To establish a baseline understanding of the urban wastewater virome, we first examined the diversity of viruses at the family level. Overall, viral reads were mapped to a total of 122 families, 1367 genera and 2546 species (Fig. 2). These findings highlight the broad diversity of viruses associated with vertebrates, plants, microbes, and other life forms within urban environments. A large fraction of the metagenomic virome consisted of bacteriophages and plant viruses (Fig. 2). Viral families linked to human disease, such as Astroviridae, Parvoviridae, and Picornaviridae, were frequently detected in high abundances. In contrast, other important human-associated families, including Adenoviridae, Caliciviridae, Hepeviridae, and Sedoreoviridae, were also detected but less frequently.
Fig. 2: Heatmap of the viral family diversity in the global wastewater samples.
The heatmap is categorized by continent and host association and ordered by RPM value of the GastroCap sequence data. The colour gradient represents the log-transformed relative abundance of reads normalized to genome length. Viral families marked with an orange dot are RNA viruses; all others are DNA viruses. OCE Oceania, SA South America, Metagenomic shotgun metagenomic dataset, and capture GastroCap sequence dataset.
Of the eight viral families targeted by GastroCap (Adenoviridae, Astroviridae, Caliciviridae, Hepeviridae, Parvoviridae, Picornaviridae, Spinareoviridae and Sedoreoviridae) which include 72 genera and 1038 species, a total of 72 genera and 165 species were identified (Fig. 2). GastroCap-based sequencing resulted in a substantial increase in both the total and relative number of reads for the targeted viral families. The median fold increase per sample, based on relative abundance in reads per million (rpm), was as follows: Adenoviridae (545-fold), Astroviridae (470-fold), Caliciviridae (5094-fold), Hepeviridae (22-fold), Parvoviridae (2-fold), Picornaviridae (132-fold), Sedoreoviridae (193-fold) and Spinareoviridae (21-fold). Viral genera associated with human gastrointestinal disease, including those commonly targeted by surveillance, such as Mamastrovirus, Bocaparvovirus, Salivirus, Kobuvirus, Norovirus, Sapovirus, Enterovirus, Mastadenovirus, Rotavirus, Paslahepevirus, Parechovirus, Cardiovirus were detected across all continents. However, while Mamastrovirus was highly abundant in most samples across all continents (Fig. 3a), Enterovirus was mainly detected at high levels in samples from Africa, a few cities in Europe (Belgrade, Bratislava), North America (Sisimiut, Chapel Hill), and South America (Quito) (Fig. 3b). Conversely, Paslahepevirus, to which Hepatitis E belongs, was more prevalent in Europe than in other regions. The longitudinal samples further revealed distinct spatial and temporal patterns in the prevalence of certain gastrointestinal viruses. For example, in Regina (Canada), Enterovirus levels showed a relative increase during summer months, whereas Mastadenovirus abundance was higher in the winter months.
Fig. 3: Pathogenic enteric viruses were widespread and showed distinct spatiotemporal patterns in urban wastewater.
Reads per million viral reads (normalized by genome length) per genus are given for the biannual (a) and longitudinal (b) GastroCap sample set. The heatmap is ordered by reads per million viral reads (normalized by genome length) and categorized by continent (a) and city (b). The colour gradient represents log-transformed relative abundance of reads. SA South America.
Principal Component Analysis revealed that cities differ in virome composition
To identify broader patterns and potential geographical signatures in virome composition, we applied principal component analysis (PCA) on both the metagenomic and GastroCap sequence datasets. The metagenomic sequence data revealed no clear clustering patterns across continents at either the family or genus level, suggesting that the viral community structures were broadly similar worldwide (Fig. 4a, b). The limited variance between locations was primarily driven by the genera Tobamovirus (a group of plant viruses within the Virgaviridae family) and Gemykibivirus (associated with multiple hosts, including plants, insects, mammals, and occasional human samples32), both of which were more prevalent in samples from Asia. However, when comparing the virome composition by city, more pronounced differences were observed in both biannual and longitudinal samples (Supplementary Fig. S4 and Fig. 4c, d). This pattern was consistent when performing the PCA separately for DNA and RNA viruses (Supplementary Figs. 6 and 7), supporting the robustness of these findings across viral genome types.
Fig. 4: PCA clustering of the urban wastewater virome compositions from 62 cities based on the shotgun metagenomic sequence dataset reveals city-level differences without clear continental clustering.
The virome compostions of the biannual samples (a, b) coloured by continent and longitudinal samples (c, d) coloured by city at the family (a, c) and genus (b, d) levels. Principal components were derived from genome size adjusted read counts subjected to a CLR transformation. Arrows represent viral families or genera, with their direction showing their contribution to the principal components and their length indicating the strength of their contribution to the variance in virome composition.
Focusing on the enteric virus families in the GastroCap sequence dataset, PCA revealed a broadly similar composition of these viral families across all continents, as indicated by overlapping clusters in the plot (Fig. 5a). The largest differences were observed for the Astroviridae and Picornaviridae families. However, at the genus level, more distinct differences in viral composition were observed between continents (Fig. 5b). Notably, Europe and Africa showed overlapping compositions, while the dataset for Oceania was distinctly separated. At the city level, even for the biannually sampled cities, PCA revealed clear distinct clustering, indicating significant variation in viral compositions between cities (Fig. 5c, d, Supplementary Fig. 5). These analyses identified the genus Mamastrovirus (Astroviridae) as the primary driver of beta-diversity. Norovirus and Sapovirus showed covariation, as reflected by the small angle between their loading vectors, indicating similar contributions to the observed variation. Within the Picornaviridae family, genera such as Enterovirus, and Cardiovirus, also displayed some degree of covariation. The cities with longitudinal sampling showed that over time, Seattle (USA) and Copenhagen (Denmark) exhibited tightly clustered sample groupings at the family level, suggesting a stable composition of enteric virus families throughout the study period. In contrast, the virome composition in Regina (Canada), Kuala Lumpur (Malaysia), Guangzhou (China), and Yaoundé (Cameroon) showed greater variability, reflecting a more dynamic and heterogeneous composition over time. At the genus level, the enteric virome composition in Melbourne (Australia) was distinct from that of all other cities, primarily due to a high abundance of Mamastroviruses which may reflect differences in interregional connectivity (Fig. 5d).
Fig. 5: PCA clustering of the urban wastewater virome compositions from 62 cities based on the GastroCap sequence dataset highlight city-level clustering, primarily driven by Mamastrovirus.
The virome compostions of the biannual samples (a, b) coloured by continent and longitudinal (c, d) samples coloured by city at the family (a, c) and genus (b, d) levels. Principal components were derived from genome size adjusted read counts subjected to a CLR transformation. Arrows represent viral families or genera, with their direction showing their contribution to the principal components and their length indicating the strength of their contribution to the variance in virome composition.
Species and genotype-level analysis of selected viral genera
We further explored the value of wastewater for monitoring viruses at finer taxonomic resolution, at the level of species and genotypes to capture fine-scale spatiotemporal patterns of viral diversity. The genera Tobamovirus (Virgaviridae) from the metagenomic sequence data and Mamastrovirus (Astroviridae) and Enterovirus (Picornaviridae) from the GastroCap data had the largest influence on virome compositions across the biannual samples in the PCA. Due to their relevance to plant, public, and animal health, these genera were selected for further investigation.
Shotgun metagenomic sequencing reveals Tobamovirus diversity in wastewater
Tobamoviruses are of major concern because they can cause disease in a wide range of agriculturally important plant species, affecting tomato, pepper and cucumber plants. We identified a total of 5204 Tobamovirus contigs in the metagenomic dataset of which 32% (1653 contigs) were assigned to 17 different species (Fig. 6), including both widely distributed, well-known viruses and emerging viruses with increasing agricultural and environmental relevance. Analysis of the sequences at the species level showed large compositional differences between samples and locations. Overall Cucumber green mild mottle virus (CGMMV) and Pepper mild mottle virus (PMMoV) – both well-studied and widely distributed viruses - were highly prevalent in urban sewage samples worldwide. PPMoV did not exhibit temporal variation, but its relative abundance was lower in Asia (Fig. 6a, b). The relative abundance of CGMMV was highest in samples from Europe, Asia and North America (Fig. 6a, b). Tobacco mosaic virus (TMV) was predominantly found in high abundance in samples from Asia and Africa, while Tobacco mild green mosaic virus (TMGMV) was primarily observed in samples from Asia (Fig. 6a, b). Additionally, we detected emerging viruses of concern, such as the rapidly spreading Tomato brown rugose fruit virus (ToBRFV), which was detected in wastewater samples from multiple locations including Rome (Italy) in 2017 and Athens (Greece), Vancouver (Canada), Be’er Sheva (Israel) and Seattle (USA) in 2018 (Fig. 6a, b).
Fig. 6: Global prevalence and abundance of species within the Tobamovirus genus in urban wastewater.
Biannual (a) and longitudinal data (b) are shown. In each panel, stacked bar plots display the number of reads per million viral reads assigned to the Tobamovirus genus (green) and to specific species (red), followed by the relative abundances of Tobamovirus species. The biannual samples were grouped using hierarchical clustering based on species-level composition. YoMV Youcai mosaic virus, TVCV Turnip VeinClearing virus, TSAMV Tropical soda apple mosaic virus, ToMV Tomato mosaic virus, ToMMV Tomato mottle mosaic virus, TMV Tobacco mosaic virus, TMGMV Tobacco mild green mosaic virus, TBRFV Tomato brown rugose virus, RMV Ribgrass mosaic virus, RheMV Rehmannia mosaic virus, PMMoV Pepper mild mottle virus, PaMMV Paprika mild mottle virus, HLSV Hibiscus latent Singapore virus, HLFPV Hibiscus latent Fort Pierce virus, CMoV Cucumber mottle virus, CGMMV Cucumber green mild mottle virus, BPMV Bell pepper mottle virus, SA South America.
Global diversity and phylogenetic analysis of Mamastrovirus in wastewater
Variation in the relative abundance of the genus Mamastrovirus contributed most to the differentiation of samples in the PCA in the GastroCap sequence dataset (Fig. 5b). To explore if these differences were associated to specific Mamastrovirus types, we developed a custom typing workflow using reference sequences from a phylogenetic study on astrovirus diversity33. Our approach identified 10,668 Mamastrovirus contigs of which 14% (1512 contigs) could be assigned a genotype, resulting in the detection of 24 different types. Our analysis revealed both well-established (endemic) and less commonly observed Mamastrovirus genotypes across various global wastewater samples. There was substantial variation in Mamastrovirus genotype distribution between samples. Compositional differences in genotypes did not clearly correlate with specific continents or cities in the biannual dataset (Fig. 7a). Human astroviruses (HAstV) formed the major fraction of Mamastrovirus genotypes detected, among these, the classical HAstV type 1–5 and 8 were most frequently detected. HAstV-1, a genotype considered endemic worldwide, was frequently detected across all sampled continents and was the predominant genotype in many samples. HAstV-8, a genotype rarely reported in clinical surveillance, was also detected in high relative abundance across all continents. Additionally, recently discovered ‘non-classical’ astroviruses, including MLB1, VA1/HMO-C, VA2/HMO-A and VA3/HMO-B were detected across multiple continents. Human astrovirus BF34, first identified and exclusively detected in Burkina Faso in 2010 (Africa), was detected in wastewater samples from Cameroon (February and March 2018), Austria (June and November 2018) and Uganda (July 2018) indicating its presence across multiple geographic regions in 2018. While less common in wastewater samples, animal astroviruses like, canine*-* (CAstV), rat astrovirus (RAstV) and feline astrovirus (FAstV) were detected across continents, with surprisingly high prevalence of CAstV in Regina, Canada. For the cities with longitudinal sampling, temporal shifts in the genotype distribution were observed. PCA analyses showed that Mamastrovirus was a major driver for the distinct clustering of Melbourne’s virome (Fig. 5d) with consistently high proportions of Mamastrovirus reads detected. In Melbourne HAstV-5 was more prevalent during 2017 and in early 2018, while HAstV-1 was mainly prevalent during the remainder of 2018. Other cities also exhibited peaks in Mamastrovirus reads over the observation period, but the timing of these peaks varied across locations and was not linked to a specific genotype.
Fig. 7: Global prevalence and abundance of Mamastrovirus genotypes in urban wastewater.
Biannual (a) and longitudinal data (b) are shown. In each panel, stacked bar plots display the number of reads per million viral reads assigned to the Mamastrovirus genus (green) and to specific genotypes (red), followed by the relative abundances of Mamastrovirus genotypes. The biannual samples were grouped using hierarchical clustering based on genotype composition. HAstV Human Astrovirus, RAstV Rat astrovirus, CAstV Canine astrovirus, FAstV Feline astrovirus, ChAstV Cheetah astrovirus, YakAstV Yak astrovirus, Other genotypes detected in fewer than 4 samples.
To investigate the global diversity and circulation patterns of the abundant ‘classical’ human astroviruses in greater detail, we conducted phylogenetic analysis. The retrieval of partial capsid sequences for HAstV type 1, 2, 3, 4, 5 and 8 (Fig. 8 and Supplementary Fig. 8) from wastewater considerably expands the number of available capsid sequences from previously underrepresented regions. For several types we observed wastewater-specific clusters, indicating that current clinical surveillance does not fully capture the diversity of HAstVs. Additionally, phylogenetic analysis revealed that integrating global wastewater sequences within a limited timeframe provided insights into country and city specific virus circulation. Viral sequences from the same city often clustered together, as did sequences from samples collected closer in time suggesting periodic circulation and replacement of viral lineages over time. Notably, evidence suggestive of region-specific virus circulation was observed for some African HAstV type 4, 5, and 8 clusters. For example, HAstV-5 wastewater sequences from Cameroon (2018), Senegal (2019), and human faecal sequences from Cameroon (2014) formed a distinct cluster. Similarly, a separate HAstV-8 cluster comprised of sequences from clinical samples from Cameroon from 2014 and wastewater sequences from Nigeria (2018) was observed.
Fig. 8: Phylogenetic analysis of Human astrovirus type 5 (HAstV-5) from global wastewater and publicly available reference sequences.
Maximum likelihood phylogenetic tree of HAstV-5 based on partial ORF2 gene (capsid) sequences. Sequences obtained from GenBank are indicated in coloured squares, while sequences derived from wastewater with a minimum length of 500 bp are denoted in coloured dots. Colours represent the continent of origin (Europe, Asia, Africa, Oceania, North America, and South America). Bootstrap values > 70 are shown.
Global diversity and phylogenetic analysis of enterovirus in wastewater
The abundance of Picornaviridae varied across continents, with enteroviruses frequently detected using GastroCap, significantly influencing PCA clustering patterns (Fig. 5d). Enteroviruses, a major public health concern due to their potential to cause a wide range of illnesses, including respiratory infections and neurological diseases, were analysed in wastewater to evaluate their surveillance potential in metagenomic datasets. A total of 2331 contigs were classified within the Enterovirus genus, of which 37% (868 contigs) could be genotyped using the RIVM Enterovirus genotyping tool, corresponding to 62 distinct Enterovirus genotypes.
Analysis of prevalence and relative abundance revealed differences in enterovirus distribution across continents (Fig. 9a, b). Samples from Africa showed the highest enterovirus abundance compared to other regions. Enterovirus B was the most prevalent species worldwide, with Coxsackievirus (CV) B5 and CV-A9 being the most prevalent types across all regions, suggesting widespread endemicity. In North America Enterovirus A was the most prevalent species, with Coxsackievirus CV-A6 and CV-A4, CV-A10 – endemic enteroviruses and common causes of hand, foot, and mouth disease- among the most detected types. Enterovirus C, though less prevalent, was frequently detected in the African region and was the dominant enterovirus species in longitudinal samples from Ecuador, Malaysia, and Cameroon. While Enterovirus D was detected sporadically, Enterovirus D68—a clinically significant genotype due to its association with respiratory illness outbreaks and acute flaccid myelitis—was primarily identified toward the end of 2017 and in 2018. In most locations, Enterovirus D68 Clade B3 was predominant, while Clade A2 was detected in Athens and Taipei. Enterovirus G, associated with infection in pigs, was detected in high abundances in several cities across all continents.
Fig. 9: Global prevalence and abundance of Enterovirus genotypes in urban wastewater.
Biannual (a) and longitudinal data (b) are shown. In each panel, stacked bar plots display reads per million viral reads assigned to the Enterovirus genus (green) and to specific genotypes (red), followed by the relative abundances of Enterovirus species and of genotypes. The biannual samples were grouped using hierarchical clustering based on genotype composition. Other = genotypes detected in fewer than 10 samples, including (vaccine-derived) poliovirus type 3. SA South America.
Geographical specificity was evident for certain enterovirus genotypes. For example, CV-A20 was exclusively found in Africa and South America, while CV-A1 and CV-A22 were detected across all continents. In Europe, Echovirus type 30 (E-30) displayed an unusually high prevalence, which was not observed in other regions, possibly indicating a recent outbreak or re-emergence after a period of lower endemic circulation. Phylogenetic analysis of partial VP1 sequences revealed that European wastewater E-30 sequences from 2018 were generally more closely related to each other than to sequences from 2017 (Fig. 10).
Fig. 10: Phylogenetic analysis of Echovirus 30 (E30) from global wastewater and publicly available reference sequences.
Maximum likelihood phylogenetic tree of E30 based on partial VP1 gene (capsid) sequences. Reference sequences obtained from GenBank are indicated with coloured squares, while sequences derived from wastewater with a minimum length of 500 bp are denoted in coloured dots. Colours represent the continent of origin (Europe, Asia, Africa, Oceania, North America, and South America). Bootstrap values > 70 are shown.
Longitudinal analysis revealed dynamic patterns in enterovirus genotype distribution across cities over time (Fig. 9b). For instance, Yaoundé (Cameroon) exhibited a broad diversity of circulating genotypes, including vaccine-derived poliovirus type 3. No sequences of the typing region VP1 for other polio types were detected. Longitudinal analysis across sites identified periods of genotype dominance, with CV-C116 – a lesser-known member of Enterovirus C with unknown clinical relevance - prevailing in Quito in 2018 and CV-A1 in Kuala Lumpur during 2017 and early 2018. In Regina, seasonal variations in overall enterovirus presence could be observed, with elevated levels occurring primarily during the summer and fall. Additionally, while multiple genotypes circulated in 2017, CV-A10 was present in higher relative abundance during specific periods, and shifted to CV-A6 in 2018, indicating dynamic changes in circulating enterovirus populations over consecutive years. Additionally, EV-A71 –one of the leading causes of hand, foot and mouth disease and associated with neurological complications - was detected in Regina in July, August and November 2017, coinciding with a period of elevated laboratory confirmed enterovirus/rhinovirus cases in Canada that year according to national surveillance data34,35.
Discussion
This study provides a comprehensive analysis of the urban wastewater virome of 62 major cities worldwide using shotgun metagenomic and capture-based (GastroCap) sequencing. By incorporating both biannual and longitudinal samples, we assessed viral diversity and temporal dynamics. While biannual sampling provided a snapshot of viral diversity within a particular location, more frequent sampling revealed a higher viral diversity per location and enabled the detection of gradual shifts in composition. Shotgun metagenomics, at least with the sequencing depth used in our study, yielded a relatively low proportion of enteric virus reads, making it difficult to effectively monitor enteric viruses in complex wastewater samples. The GastroCap probe set significantly enriched clinically relevant enteric viruses, whereas metagenomic sequencing primarily detected bacteriophages and plant viruses. Thus, for the surveillance of human pathogenic enteric viruses, which are stable in aquatic environments36, wastewater is a valuable source when combined with GastroCap enrichment. Additionally, a large proportion of reads could not be assigned to known virus taxa, highlighting the potential for future discovery of novel viruses. Our findings and those of others demonstrate the utility of wastewater metagenomics as a powerful One Health surveillance tool, capturing the diversity and dynamics of viruses associated with human, animal, and environmental health.
Cross-continental comparisons revealed homogeneity in viral community composition at high taxonomic levels, consistent with findings that viral communities are often more specific to environmental habitats (e.g. wastewater or marine) rather than geographical location37. At the city level, however, we observed geographical partitioning at both the family and genus levels, likely influenced by local factors such as climate, weather, demographics, agricultural practices, diet, population density and previous exposure history30. Sampling in South America, Central America and Oceania was limited, making the findings less generalizable for these regions. Additionally, an expanded dataset of these wastewater samples -sequenced using a strategy optimized for bacterial and AMR profiling- showed distinct regional variations in the resistome and bacteriome across continents. Specifically, bacterial composition divides the world’s regions into roughly two major groups (Europe, Central Asia & North America versus Africa & Middle East), while the resistomes showed more unique regional profiles31. Our analysis revealed very limited antibiotic resistance gene (ARG) presence in the viral fraction, which may be due to the virus-enrichment protocol and the infrequent occurence of ARGs by bacteriophage genomes38 (Supplementary Table 1).
By increasing taxonomic resolution to the species and genotype level, our results and those of others demonstrate that metagenomic and phylogenetic analysis of viral communities in wastewater can reveal distinct temporal and geographical patterns30. Tobamoviruses were highly prevalent in urban wastewater, consistent with their detection in various water sources39,40,41 The widespread detection of CGMMV and PMMoV in our study align with their documented widespread prevalence42,43,44. Notably, PMMoV has been proposed as a potential viral indicator for human faecal contamination in water and wastewater37,42,43. However, our data showed significant regional variability, with lower proportions in Asia, suggesting that regional differences should be considered when interpreting PMMoV in water quality assessments. Variations in the composition of Tobamovirus species might reflect differences in agricultural practices or diet. For instance, the detection of TMGMV in China could be related to their substantial tobacco production and consumption45,46,47. Importantly, early evidence of the emerging plant viruses like ToBRFV could also be detected. ToBRFV, which causes severe infections in tomatoes and peppers, was first described in Jordan in 2015, and retrospectively traced to Israel in 201448. Since then, it has spread widely, with our data indicating its presence in Canada as early as July 2018, more than a year before its first official report in 2019. Similarly, detections in Italy in November 2017 and in Greece in June 2018 precede their reported outbreaks by nearly a year49,50. These findings highlight the utility of wastewater monitoring for the early detection and tracking of emerging plant pathogens.
Extending our study to viruses impacting animal and human health, we were able to characterize the diversity of astroviruses on a global scale. The prevalence and distribution of HAstV genotypes in our study generally align with trends o