-
…
1396 Accesses
372 Altmetric
65 Mentions
Abstract
Background
Airborne microbial communities, although often challenging to…
-
…
1396 Accesses
372 Altmetric
65 Mentions
Abstract
Background
Airborne microbial communities, although often challenging to study due to low biomass, play crucial roles in public health and pathogen transmission. Through shotgun metagenomics, this study utilizes non-invasive air sampling of face masks and aircraft cabin filters to investigate microbial diversity in environments with frequent human interactions, including hospitals and airplanes. A comprehensive sampling and analysis workflow was developed, incorporating environmental and enrichment protocols to enhance microbial DNA recovery and diversity profiling.
Results
Despite limitations in biomass, optimized extraction methods allowed for the successful identification of 407 species, with dominant taxa including Cutibacterium acnes, Staphylococcus epidermidis, Sphingomonas hankookensis, and Methylobacterium radiotolerans. Enrichment processing resulted in greater metagenome-assembled genome (MAG) recovery and higher antimicrobial resistance gene (ARG) identification.
Conclusions
The findings highlight the presence of ARGs in high-occupancy public spaces, suggesting the importance of monitoring and the potential for mitigating airborne transmission risks in such environments. This study demonstrates the utility of combining environmental and enrichment sampling to capture comprehensive microbial and ARG profiles in confined spaces, providing a framework for enhanced pathogen monitoring in public health contexts.
Video Abstract
Similar content being viewed by others



Background
Although often seen as inhospitable for microbial life due to its low nutrient availability, air is a critical medium for the transport and dispersal of microorganisms [1, 2]. These airborne microbes originate from various natural sources, such as soil, oceans, vegetation, and anthropogenic activities [3]. While these microorganisms play roles in ecological processes, airborne transport may significantly impact human health by transmitting infectious pathogens [4, 5].
The study of airborne microbes has historically lagged research on other environments, such as soil and water, mainly due to the challenges in sampling and analyzing these microorganisms [6, 7]. However, the COVID-19 pandemic, fueled by airborne transmission, led to a surge in research on airborne microbes [8, 9]. While culture-based studies have been foundational, less than 1% of airborne prokaryotes are cultivable using traditional microbiological procedures, offering an incomplete picture of microbial community structures and dynamics [10,11,12]. Nonetheless, targeted enrichment can be strategically useful in low-biomass contexts by boosting DNA yield and improving genome recovery.
In recent years, culture-independent techniques, particularly metagenomics, have emerged as powerful tools for studying the structure and function of airborne microbial communities and pathogens [11, 13]. Metagenomics provides a broader assessment of microbial diversity and metabolic potential than culture-based techniques. However, the application of metagenomics to airborne microbial research presents challenges. Low biomass, sample collection logistics, DNA extraction difficulties, and contamination issues have hindered comprehensive analysis of airborne microbial communities [7, 14, 15]. These technical challenges limit the detection of low-abundance organisms and can complicate efforts to generate reliable data.
In high-risk environments such as aircraft cabins and hospitals, face masks and ventilation systems equipped with high-efficiency particulate air (HEPA) filters play crucial roles in reducing the transmission of airborne pathogens [16, 17]. HEPA filters capture 99.97% of airborne particles, including bacteria and viruses, but do not destroy trapped microbes, allowing viable ones to multiply and pose a recontamination risk [18]. Similarly, face masks block infectious droplets and aerosols, but improper handling, like frequent hand contact, can lead to microbial buildup, reducing their protective effectiveness [19]. While potentially problematic for proper use and maintenance, the ability of HEPA filters and face masks to trap and retain microorganisms may incidentally provide a convenient mechanism for air sampling.
During a pandemic, it would be valuable to determine whether an airborne pathogen is present in various situations, e.g., on flights or within a healthcare environment. Aircraft HEPA filters, which integrate air over thousands of flight hours, provide a population-level, long-term aggregate of the cabin air microbiome. In contrast, disposable face masks offer short-term, individual-level sampling that reflects immediate exposure. Including both sample types allows for comparison of population-level and individual-level microbial profiles. However, given that these are typically expensive to change after 8000–10,000 flight hours, analyzing disposable face masks seems a more logistically realistic strategy.
This study explores the diversity and function of airborne microbial communities through shotgun metagenomics, focusing on samples collected from face masks worn by air travelers and physicians in hospitals and an aircraft HEPA filter. Given the low biomass of airborne microbial samples, we optimized DNA extraction protocols and incorporated enrichment protocols using brain heart infusion (BHI) media to stimulate the growth of airborne microorganisms. This research seeks to bridge critical gaps in understanding microbial populations in confined environments while evaluating approaches that may inform public health risk assessment.
Methodology
Sample collection and processing
One aircraft cabin filter was removed from a commercial airplane after 8039 h of usage. The filter was immediately placed in a sterile, DNA-free plastic bag and transported to the laboratory. In an aseptic environment, the filter was manually disassembled, and the filtration material was carefully removed using sterilized, DNA-free scissors. The filter material was then placed into sterile, DNA-free plastic bags and stored in a refrigerator until DNA extraction to minimize contamination and preserve sample integrity. Samples were taken randomly from different areas of the filter. Each sample measured approximately 4 cm by 4 cm.
Face masks worn by air travelers and healthcare professionals in a hospital unit or ward were collected following a single use. A total of 22 standard 3-layer disposable surgical masks were included, comprising 10 obtained from air travelers and 12 from healthcare professionals.
Traveler masks were collected from a range of domestic and international routes, with wear times spanning from 2 h and 50 min to 15 h, for an average of approximately 7 h and 30 min. For healthcare professionals, masks were worn for a typical clinical shift of ~ 8 h. Each sample was placed in a sterile bag immediately after use and transported to the laboratory. Unworn face masks served as controls and were removed from sterile packaging, exposed to aircraft cabin conditions for the duration of flight, and then sealed in sterile bags for transport.
The outer layer of each collected face mask and aircraft cabin filter sample was sectioned into 4 cm by 4 cm pieces using sterile scissors. These were divided into two parts; one 2 cm by 4 cm portion was used for direct DNA extraction, and the other half was used for enrichment before DNA extraction.
Enrichment culture
Samples from facemasks and the aircraft cabin air filter were incubated in 50-ml brain heart infusion media at 37 °C for up to 5 days with agitation at 120 rpm. Aliquots of 1 ml were taken every 24 h, and microbial cells were collected by centrifugation. The extracted DNA from these aliquots, originating from the same sample, was pooled together for downstream analysis. These samples are hereafter referred to as “Enrichment,” and unenriched samples are called “Environmental” (Fig. 1).
Fig. 1
Schematic overview of the study design and workflow. Aircraft cabin filters and face masks (traveler, healthcare, and control) were processed either directly (environmental samples) or following enrichment culture. DNA was extracted, validated via qPCR, and sequenced using Illumina NextSeq. Bioinformatics analyses included read-based profiling and assembly-based genome reconstruction. Comparative analyses integrated results across environmental vs enrichment and read-based vs assembly-based pipelines
DNA extraction
DNA extraction was performed using the optimized method described by [15] and utilizing the DNeasy PowerSoil Kit (Qiagen, Hilden, Germany). The sectioned samples were placed into PowerBead tubes supplied with the kit. Lysis buffer was added at twice the volume recommended by the manufacturer. The tubes were incubated at 37 °C for 30 min to enhance lysis efficiency. Following incubation, the samples were homogenized at a lower speed of 200 rpm for 1 h and 30 min to mix the contents gently. After homogenization, the samples were centrifuged at a medium speed of 2000 × g for 10 min to pellet debris while preserving the integrity of the DNA. The supernatant was then carefully transferred to a new tube, ensuring no debris was carried over into the final sample. The extraction process was then continued according to the standard protocol provided with the DNeasy PowerSoil Kit, including steps for chemical precipitation, washing, and DNA elution.
The concentration and quality of the extracted DNA were initially assessed using a Qubit 4 fluorometer with the dsDNA HS assay kit (Thermo Fisher Scientific, Waltham, MA, USA).
PCR amplification of 16S rRNA gene and quantitative PCR (qPCR) for microbial DNA quantification
Due to the low biomass nature of the samples especially for environmental samples, the DNA concentrations were often below the detection limit of the Qubit system. Therefore, PCR amplification of the 16S rRNA gene was performed to confirm the presence of microbial DNA using the standard primers 515 F (5′-GTGCCAGCMGCCGCGGTAA-3′) and 806R (5′-GGACTACHVGGGTWTCTAAT-3′) [20]. The reaction was carried out in a 25 µL volume using Promega Green GoTaq™ master mix (Promega Corporation, Madison, WI, USA) containing 1X Green GoTaq buffer, 0.2 µM of each primer, and 5 µL of template DNA. The thermal cycling conditions were as follows: an initial denaturation at 95 °C for 3 min, followed by 35 cycles of denaturation at 95 °C for 30 s, annealing at 55 °C for 30 s, and extension at 72 °C for 30 s, with a final extension at 72 °C for 5 min. PCR products were analyzed by agarose gel electrophoresis to determine successful amplification and the expected product size.
Quantitative real-time PCR (qPCR) was subsequently performed to quantify 16S rRNA gene copy numbers and enable normalization of recovery efficiency using PowerUp™ SYBR™ Green Master Mix (Thermo Fisher Scientific, Waltham, MA, USA) on a QuantStudio™ Real-Time PCR System (Thermo Fisher Scientific). Each reaction was performed in a total volume of 20 µL, containing 10 µL of 2X PowerUp™ SYBR™ Green Master Mix, 1 µL of each 10 µM primer (515F and 806R), 2 µL of template DNA, and 6 µL of nuclease-free water. The thermal cycling protocol included an initial uracil-DNA glycosylase (UDG) activation step at 50 °C for 2 min to remove carryover contamination, followed by Dual-Lock™ DNA polymerase activation at 95 °C for 2 min. Amplification was performed over 40 cycles of denaturation at 95 °C for 15 s and annealing/extension at 60 °C for 30 s. A melt curve analysis was conducted with denaturation at 95 °C for 1 s, annealing at 60 °C for 20 s, and a gradual increase in temperature to 95 °C at a rate of 0.1 °C per second with continuous fluorescence measurement.
Quantification of 16S rRNA gene copies was achieved by generating a standard curve using serial tenfold dilutions of a 4125 bp Escherichia coli plasmid containing a 16S rRNA gene insert [21]. The standard curve had an R2 value of 0.96, and the qPCR efficiency was calculated to be 111%. All qPCR reactions were performed in triplicate, and a no-template control (NTC) was included in each run to monitor for contamination or non-specific amplification. For normalization, a synthetic spike-in DNA standard (10⁹ copies/µL; 20 µL per sample) was added to each sample before DNA extraction. The observed spike DNA copy number was quantified in each sample via qPCR, and the recovery efficiency of the spike was calculated as the ratio of observed spike copies to the theoretical spike copies (5.27 × 10⁻⁷). In addition, Ct values, which indicate qPCR amplification of the bacterial 16S rRNA gene (V4 region), and copy numbers, representing estimated 16S rRNA gene copies per µL based on a standard curve generated from known template concentrations, were also recorded. This recovery efficiency was then used to adjust each sample’s 16S rRNA gene copy numbers, ensuring accurate quantification that accounts for potential losses during DNA extraction or other processing steps.
Spike-in controls
Spike control samples were prepared by inoculating a 2 cm × 2 cm sectioned sterile face mask with the ZymoBIOMICS Microbial Community Standard (Zymo Research, Irvine, CA, USA; Catalogue No. D6300). Each section was inoculated with 100 µL of the ZymoBIOMICS Microbial Community Standard. The inoculated face mask sections were incubated at 37 °C for 30 min to ensure microbial adhesion. Following the incubation, the DNA extraction was performed as mentioned above. The standard contains a defined mixture of microbial species with the following theoretical composition based on genomic DNA: Listeria monocytogenes, 12%; Pseudomonas aeruginosa, 12%; Bacillus subtilis, 12%; Escherichia coli, 12%; Salmonella enterica, 12%; Lactobacillus fermentum, 12%; Enterococcus faecalis, 12%; Staphylococcus aureus, 12%; Saccharomyces cerevisiae, 2%; and Cryptococcus neoformans, 2%.
Shotgun metagenomic sequencing
Shotgun metagenomic sequencing was performed using the Illumina NextSeq 2000 platform, employing a 300-cycle flow cell kit to produce 2 × 150 bp paired-end reads. The Illumina DNA Prep (tagmentation) kit prepared libraries with unique dual indexes. PCR cycles were increased to accommodate the low DNA input, following the manufacturer’s protocol for low-input library preparation. A 1–2% PhiX control was spiked into the sequencing run to support optimal base calling. Sequencing was performed to achieve a targeted depth of 1000 Mbp, corresponding to approximately 6.7 million paired-end reads per sample. The sequencing process ensured comprehensive coverage suitable for detecting low-abundance taxa in metagenomic samples.
Sequencing of these samples generated a total of 1,677,589,656 raw reads. After quality control, the cleaned dataset consisted of 1,279,623,449 reads, representing a 23.7% reduction in the total number of reads (Supplementary Table 1).
Bioinformatics analysis
Quality control and contaminant removal
Raw sequencing reads were first assessed using FastQC v 0.12.0 [22], and then, sequencing adapters and low-quality bases were trimmed using Trimmomatic v 0.39 [23] with specific parameters (SLIDINGWINDOW:4:20 MINLEN:50) to retain only high-quality reads. Following KneadData (https://huttenhower.sph.harvard.edu/kneaddata/) default parameters, the human genome DNA and PhiX control DNA sequences were filtered out. Additionally, overrepresented sequences were identified and removed. Finally, the clean reads underwent a final round of quality assessment using FastQC v 0.12.0, and the results were compiled with MultiQC v 1.23 [24] to generate a comprehensive summary report.
Read-based microbial community profiling and functional analysis
Taxonomic profiling of purified metagenomic reads was performed using MetaPhlAn4 [25], utilizing the Jun23_CHOCOPhlAnSGB_202307 database of marker genes. Subsequent analyses of the MetaPhlAn output included evaluations of microbial community structure and the calculation of diversity indices.
Strain-level resolution was achieved using StrainPhlAn 4.1 [26], focusing on three abundant species across the samples: Cutibacterium acnes (t__SGB16955), Staphylococcus epidermidis (t__SGB7865), and Staphylococcus hominis (t__SGB7858).
Functional profiling was conducted using HUMAnN 3 [27], utilizing the ChocoPhlAn database (v201901_v31) for taxonomic gene family annotations and the UniRef50 database (uniref50_201901b) for protein functional assignments. Following functional profiling, gene families were regrouped into MetaCyc reactions using predefined mapping files provided by HUMAnN. After regrouping, the data were normalized to relative abundance. The non-stratified table, which compiles pathway data from all species, was employed for further downstream analyses and visualizations.
Diversity analysis
Diversity analyses were performed on microbial community profiles generated by MetaPhlAn. Alpha diversity, measured by the Shannon index, was calculated for each sample using species-level abundance data. Statistical significance between sample types was tested using the Kruskal–Wallis test, followed by pairwise Wilcoxon rank-sum tests with Benjamini–Hochberg correction. Bray–Curtis dissimilarity matrices were computed from MetaPhlAn’s species abundance profiles for beta diversity. Principal coordinate analysis (PCoA) was performed to visualize differences in community composition across sample types. PERMANOVA analysis, as implemented in the adonis2 function of the vegan package v 2.6–4 [28], was used to statistically assess variation in community structure between sample types, with p-values included in the PCoA plots.
Metagenomic assembly, genome construction, and coverage analysis
The clean metagenomic reads were assembled into contigs using MEGAHIT v1.2.9 [29], using the default parameters. After assembly, the quality of the resulting contigs was evaluated using QUAST v5.2.0 [30]. The contigs were processed into metagenome-assembled genomes (MAGs) via MetaBAT 2 [31]. The binning procedure was initiated by aligning the raw paired-end reads to the assembled contigs using bowtie2 v2.5.4 [32]. These aligned reads were sorted and indexed using Samtools v1.20 [33], generating the BAM files necessary for MetaBAT 2. MetaBAT 2 was then run with its default parameters, utilizing these BAM files alongside the contig assemblies to delineate MAGs. After binning, the quality of the MAGs was evaluated using CheckM v1.1.6 [34]. Each bin generated from MetaBAT 2 was analyzed with CheckM to determine the completeness and contamination levels of the assembled genomes. The MAGs with a completeness of 50% or greater and contamination of 10% or less were retained for downstream analysis.
Log-transformed coverage values were visualized to represent the distribution and density of coverage to analyze differences in coverage between enrichment and environmental processing conditions. Kruskal–Wallis tests were used to assess statistical differences in microbial coverage between sample types, and outliers, identified as points beyond 1.5 times the interquartile range (IQR), were highlighted as red circles in the plots. Coverage comparisons between Enrichment and Environmental conditions were performed using Wilcoxon signed-rank tests, and significant differences were annotated on the plots.
Taxonomic and functional classification of the MAGs
MAGs underwent taxonomic classification using GTDB-Tk v2.4.0 [35] performed using the most recent GTDB release (Release 220). Prokka v1.14.5 [36] was employed to annotate high-quality MAGs. Antibiotic resistance gene identification within the MAGs was performed using the CARD (Comprehensive Antibiotic Resistance Database) [37] via the Abricate tool v1.0.0 [38].
Comparative analysis of microbial diversity
A comparative analysis between species detected by MetaPhlAn and GTDB-Tk was performed to identify the overlap in species identification and highlight the species exclusively identified by either tool. Species presence and relative abundances were compared across the Environmental and Enrichment processing using the Wilcoxon signed-rank test for paired samples. The test evaluated significant differences in species abundance between Environmental and Enrichment samples.
Results
Microbial composition of face masks and aircraft cabin filters
We analyzed a total of 55 samples, of which 31 were Environmental (direct DNA extraction), 22 were Enrichment (cultivation prior to DNA extraction), and 2 were spike-in controls. Each specimen was divided into two portions, with one processed directly for DNA extraction (Environmental) and the other subjected to cultivation before DNA extraction (Enrichment).
Specifically, among the 19 hospital masks analyzed, 12 were Environmental and 7 were Enrichment samples; among the 16 traveler masks analyzed, 10 were Environmental and 6 were Enrichment samples; and among the 12 aircraft filter subsamples analyzed, 6 were Environmental and 6 were Enrichment samples. Among the 8 control masks analyzed, 3 were Environmental, 3 were Enrichment, and 2 were spike-in controls.
It is important to note that although each mask or filter was spliced for both Environmental and Enrichment processing, some samples failed during DNA extraction or sequencing. As a result, not every specimen has a corresponding Environmental–Enrichment pair (Supplementary Table 1).
Environmental samples showed lower DNA recovery (~ 0.48 ng/µL) and higher Ct values (~ 17.36) when quantified by qPCR targeting the 16S rRNA gene, with limited variability across sample types. In contrast, Enrichment samples consistently yielded the highest DNA concentrations (~ 20.64 ng/µL), lower Ct values (~ 15.62), and higher copy numbers (~ 1016). Among Enrichment samples, Hospital settings exhibited the highest DNA concentrations (~ 45.44 ng/µL) and copy numbers (~ 1016), while Travel samples showed the lowest Ct values (~ 12.82), indicating efficient amplification despite lower DNA concentrations. Aircraft filter samples under Environmental processing demonstrated the lowest DNA recovery, highlighting challenges in low-biomass sample types (Fig. 2).
Fig. 2
Comparative analysis of DNA concentration, Ct values, and copy numbers. DNA concentration (ng/µL) across processing types (A) and across sample types (B) within each processing type. Ct values, which indicate qPCR amplification of the bacterial 16S rRNA gene (V4 region), across processing types (C) and across sample types within each processing type (D). Copy numbers, representing estimated 16S rRNA gene copies per µL based on a standard curve, across processing types (E) and across sample types (F) within each processing type. Mean values are marked with black dots and annotated above the boxes
The microbial community composition analysis using MetaPhlAn4 at the domain level revealed that most sequences were classified as Bacteria, accounting for 99.6% of the total relative abundance. A small fraction of the community was classified as Eukaryota, contributing 0.446% to the relative abundance. Further taxonomic breakdown identified 12 distinct phyla and 407 species across the samples (Table 1).
Actinobacteria dominated the Environmental samples, particularly in Travel (75%) and Hospital (59%) masks, whereas Firmicutes became strongly enriched under cultivation, reaching 85–87% in Hospital and Travel masks. Proteobacteria were abundant in Environmental aircraft filter samples (67%) but declined substantially in Enrichment samples (< 10%). Unclassified bacterial reads consistently contributed ~ 12% across all sample categories. Eukaryotic taxa, primarily Basidiomycota and Ascomycota, were detected only at very low levels (< 3%), most notably in Unworn masks (Table 1).
The phylum level revealed notable differences in relative abundance between Environmental and Enrichment processing methods across the sample types. Actinobacteria exhibited a substantial presence across all sample types and processing methods, with an exceptionally high relative abundance in environmental samples, 41.5% in Travel samples, and 32.5% in Hospital samples. Firmicutes showed a significant presence in Enrichment samples, especially in Travel (29.5%) and Hospital (29.0%) samples, whereas its relative abundance in Environmental samples was notably lower. Other phyla, such as Bacteroidota, Candidatus Saccharibacteria, and Fusobacteria, exhibited lower relative abundances, generally below 1%, across all sample types and processing methods (Fig. 3A, B).
Fig. 3
Microbial community composition across different sample types and processing methods. A–C The relative abundances of microbial phyla in environmental, enrichment, and spike control samples, respectively. D–F Genus-level relative abundances in environmental, enrichment, and spike control samples. G A heatmap of the log10-transformed relative abundances of the top 20 most abundant species across all samples, with metadata annotations for sample type and processing method displayed above the heatmap. For the barplots, the labels highlight taxa with > 5% abundance
Cutibacterium, Staphylococcus, and Corynebacterium were dominant across many samples at genus level, while Escherichia and Sphingomonas contributed variably depending on sample type and processing method (Fig. 3D–F).
In Environmental samples, Cutibacterium was highly abundant, particularly in Travel and Hospital masks, reaching 73.9% and 57.1% relative abundance, respectively. Aircraft filter samples were enriched in Sphingomonas (34.7%) and Escherichia (22.6%). In contrast, Enrichment conditions produced a pronounced shift in community composition, with Bacillus dominating across sample types (up to 79%), and increased contributions from Paenibacillus and Staphylococcus, reflecting selective outgrowth of Firmicutes.
Staphylococcus epidermidis, Staphylococcus hominis, and Cutibacterium acnes were consistently detected across Hospital and Travel samples, whereas Sphingomonas hankookensis and Escherichia coli were enriched in Aircraft filter samples. Under Enrichment conditions species such as Bacillus subtilis and Paenibacillus humicus became dominant, underscoring the strong culture bias toward Firmicutes (Fig. 3G).
Comparative analysis of microbial diversity across environmental and enrichment processing
Across both microbial taxonomy annotation tools, MetaPhlAn and GTDB-Tk, 409 species were identified. MetaPhlAn accounted for 407 species, while GTDB-Tk identified only nine species. A total of six species, Sphingomonas hankookensis, Cutibacterium acnes, Methylobacterium radiotolerans, Escherichia coli, Pseudomonas aeruginosa, and Paenibacillus humicus were common to both tools. Species identified solely in GTDB-Tk (MAGs) and absent in MetaPhlAn included Geobacillus proteiniphilus and Microsaccharimonas sp001897305 (Fig. 4A).
Fig. 4
A The presence and absence of species were identified using two bioinformatics tools, GTDB-Tk and MetaPhlAn, and represented in a binary heatmap. B Log10 abundance of species detected in Enrichment and Environmental processing. C Comparison of microbial species diversity and relative abundance between Enrichment and Environmental processing types
Among the overall species detected, C. acnes was the most abundant species, consistently detected in both Enrichment and Environmental samples, along with other vital species such as Staphylococcus epidermidis, Staphylococcus hominis, Bacillus cereus, E. coli, and P. aeruginosa. However, certain species, including Geobacillus thermoleovorans, Salmonella enterica, and Limosilactobacillus fermentum, were only present in Environmental samples, while others, such as Staphylococcus warneri and Niallia taxi, were found solely in Enrichment samples (Fig. 4B).
Of all the species analyzed, 256 were uniquely present in the Environmental samples, and a total of 86 species were shared between the Environmental and Enrichment categories. However, 65 species were identified solely in the Enrichment samples (Fig. 4C). A statistically significant difference in species abundance was found when comparing the distribution of relative abundances of microbial species across Environmental and Enrichment processing types (Wilcoxon p-value, 1.19e-16). Environmental processing yielded higher relative abundance across species than Enrichment processing. The median abundance for species in the Environmental samples was consistently higher than in the Enrichment samples.
Strain-level profiling with StrainPhlAn indicated substantial diversity within the dominant taxa Cutibacterium acnes, Staphylococcus epidermidis, and Staphylococcus hominis (Supplementary Fig. 1). Multiple distinct strain variants of each species were recovered across Hospital, Travel, Aircraft filter, and Unworn mask samples. The presence of these strain-level differences across sample categories suggests that the observed organisms were not derived from a single contaminant source but instead represent genuine microbial community members.
Microbial diversity and spatial distribution
The Shannon diversity index varied across different sample types under environmental and enrichment conditions. Environmental samples demonstrated distinct diversity patterns (Fig. 5A). Hospital samples exhibited the highest Shannon diversity, followed by Aircraft filter samples, which showed a broader range of values. In contrast, Travel-related samples had the lowest median diversity (Fig. 5B). Under enrichment conditions, Travel-related samples consistently exhibited the lowest diversity, mirroring trends observed under environmental conditions. Significant diversity differences were observed across Aircraft filter and Travel samples (p < 0.05, Fig. 5C).
Fig. 5
Alpha and beta diversity of microbial communities across different sample types and processing methods. The Shannon Diversity Index is shown by sample type for Environmental vs. Enrichment (A), Environmental samples (B), and Enrichment samples (C). Beta diversity analyses based on unweighted UniFrac distances are displayed for Environmental vs. Enrichment (D), Environmental samples (E), and Enrichment samples (F). Beta diversity analyses based on Bray–Curtis dissimilarity are shown for Environmental vs. Enrichment (G), Environmental samples (H), and Enrichment samples (I)
Beta diversity analysis based on unweighted UniFrac distances revealed no significant separation between environmental and enrichment samples (PERMANOVA p = 0.652, Fig. 5D). Within environmental samples, distinct clustering was observed among sample types (PERMANOVA p = 0.013, Fig. 5E). Aircraft filter and Hospital samples formed tight clusters, while Travel-related samples were the most dispersed. Unworn samples partially overlapped with Hospital samples. For enrichment samples, clustering patterns were also evident (PERMANOVA p = 0.046, Fig. 5F).
Analysis using Bray–Curtis distances revealed significant separation between environmental and enrichment samples (PERMANOVA p = 0.001, Fig. 5G). Within environmental samples, sample types clustered distinctly (PERMANOVA p = 0.022, Fig. 3H), with tighter clustering observed for Hospital and Aircraft filter samples compared to Travel-related and Unworn samples. For enrichment samples, distinct clustering patterns were also observed (PERMANOVA p = 0.012, Fig. 5I). Aircraft filter and Unworn samples exhibited tighter clusters, while Travel-related and Hospital samples were more dispersed.
Functional profiling of gene families using HUMAnN3 MetaCyc pathway data
Gene families related to antimicrobial resistance, such as β-lactamase (EC 3.5.2.6), kanamycin kinase (EC 2.7.1.95), and gentamicin 3′-N-acetyltransferase (EC 2.3.1.60), were particularly enriched in Hospital samples compared to other sample types. In contrast, gene families involved in core metabolic functions, such as DNA-directed RNA polymerase (EC 2.7.7.6) and glutaminase (EC 3.5.1.2), exhibited a more even distribution across sample types (Fig. 6A).
Fig. 6
Top 30 gene families from HUMAnN3 MetaCyc pathway and antibiotic resistance gene abundance across sample types and processing categories. A Heatmap displaying the top 30 gene families identified, shown as Log10 relative activity. B Stacked bar plot showing the mean relative abundance of the antibiotic resistance gene families
Notably, the abundance of ARGs was higher in Environmental samples than in Enrichment samples. Specifically, beta-lactamase and gentamicin 3-N-acetyltransferase were significantly more abundant in the Aircraft filter and Unworn samples under Environmental samples. Conversely, kanamycin kinase showed a relatively higher abundance in the Unworn samples under Enrichment (Fig. 6B).
Evaluation of metagenome-assembled genomes
A total of 176 MAGs were recovered from various sample types and processing methods. The highest number of MAGs was obtained from Aircraft filter samples processed through enrichment (65 MAGs), followed by Hospital samples processed through enrichment (43 MAGs). In terms of MAGs (having ≥ 50% completeness and ≤ 10% contamination), the Aircraft filter samples processed under environmental conditions had a proportion of 44% (11 out of 25 MAGs), while those processed under enrichment conditions had a slightly higher proportion of 49.2% (32 out of 65 MAGs). No MAGs were recovered from the unworn Environmental samples. Hospital samples processed through environmental methods exhibited an average completeness of 84.9% and contamination of 0.8%, whereas those processed through enrichment had slightly higher average contamination at 1.9% but similar completeness (86.6%) (Table 2).
The analysis comparing the average microbial coverage between Enrichment and Environmental processing conditions revealed no statistically significant differences. The analysis comparing the average microbial coverage defined as the mean sequencing read depth across MAGs, obtained by mapping metagenomic reads back to the assembled genomes between Enrichment and Environmental processing conditions. This i