Main
Inflammation is a state of the immune system that serves to protect the human body from environmental challenges, thereby preserving homeostasis1. Inflammatory processes are activated in response to various triggers, such as infection or injury, and involve a multistep defensive mechanism to eliminate the source of perturbation2. Inflammation represents an altered state within the immune system, which can manifest as either a protective or a path…
Main
Inflammation is a state of the immune system that serves to protect the human body from environmental challenges, thereby preserving homeostasis1. Inflammatory processes are activated in response to various triggers, such as infection or injury, and involve a multistep defensive mechanism to eliminate the source of perturbation2. Inflammation represents an altered state within the immune system, which can manifest as either a protective or a pathological response3. The cellular and molecular mediators of inflammation play pivotal roles in nearly every human disease4.
The initiation of inflammatory processes is driven by cellular stimulation, triggered by the release of proinflammatory cytokines5. These cytokines exert autocrine and paracrine effects, activating endothelial cells and subsequently increasing vascular permeability. Chemokines are essential for recruiting additional immune cells for pathogen eradication6. Inflammation is a central driver in cardiovascular7, autoimmune8 and infectious diseases9 and even cancer10. The success of therapies targeting inflammation underscores the importance of understanding the underlying pathways11,12.
Single-cell RNA sequencing (scRNA-seq) is becoming a conventional method for detecting altered cell states, enabling the comparison of transcriptional profiles during inflammation13. A differential analysis of cell states and gene expression programs at the cellular level can guide a more holistic understanding of inflammation in acute and chronic diseases to form the basis for future precision medicine tools. In the present study, we annotated the common immune cell types present in the peripheral blood and identified disease-specific cell states that exhibit functional specialization within the inflammatory landscape. Beyond a disease-centered classification, we modeled the expression profiles of inflammatory molecules to uncover discriminative genes related to immune cell activation, migration, cytotoxic responses and antigen presentation activities. Ultimately, we propose a classifier framework based on peripheral blood mononuclear cells (PBMCs), demonstrating the potential of circulating immune cells to contribute to precision medicine strategies for patients suffering from acute or chronic inflammation.
Results
An inflammation landscape of circulating immune cells
To chart a comprehensive landscape of immune cells in circulation of healthy individuals and patients suffering from inflammatory diseases, we analyzed the transcriptomic profiles of more than 6.5 million PBMCs (6,340,934 after filtering), representing 1,047 patients and 19 diseases, split into a main Inflammation Atlas and two validation datasets (Fig. 1a,b). Diseases were broadly classified into five distinct groups: (1) immune-mediated inflammatory diseases (IMIDs, n = 7) (systemic lupus erythematosus (SLE), rheumatoid arthritis (RA), psoriatic arthritis (PsA), psoriasis (PS), ulcerative colitis (UC), Crohnʼs disease (CD) and multiple sclerosis (MS)); (2) acute (n = 1) (sepsis); (3) chronic inflammation (n = 3) (chronic obstructive pulmonary disease (COPD), asthma and cirrhosis); (4) infection (n = 4) (influenza virus (Flu), SARS-CoV-2 (COVID), hepatitis B virus (HBV) and human immunodeficiency virus (HIV)); and (5) solid tumors (n = 4) (breast cancer (BRCA), colorectal cancer (CRC), nasopharyngeal carcinoma (NPC) and head and neck squamous cell carcinoma (HNSCC)), which were profiled along with healthy donor samples (Fig. 1a and Extended Data Fig. 1a). Our cohort included various scRNA-seq chemistries (10x Genomics 3′ and 5′ mRNA) and experimental designs (CellPlex and genotype multiplexing), as well as individuals of both sexes (56% female, 43% male) and across age groups, to comprehensively capture technical and biological variability (Methods and Supplementary Table 1). To learn a generative model of circulating immune cells of inflammatory diseases, we applied probabilistic modeling of the single-cell data using scVI14 and scANVI15, considering clinical diagnosis, sex and age. Generative probabilistic models proved superior performances in integrating complex datasets compared to other approaches16, particularly if cell annotations are available (Extended Data Fig. 1b,c). Applied here, the resulting cell embedding space was batch effect corrected while preserving biological heterogeneity (that is, previously annotated cell types and states; Supplementary Table 2). From the joint embedding space, we initially assigned cells to major immune cell lineages (Level 1; Fig. 1c and Extended Data Fig. 1d). Then, following a recursive, top-down clustering approach, we obtained a total of 64 immune populations (Level 2), comprehensively resembling immune cell states of the innate and adaptive compartments (Fig. 1d, Supplementary Fig. 1 and Supplementary Table 3). High-level compositional analysis (Level 1) across diseases revealed significant changes of cell type distributions (Extended Data Fig. 1d) and validated previously described alterations in blood cells from patients. For example, we confirmed low levels of unconventional T cells (UTCs), innate lymphoid cells (ILCs) and naive CD4 T cells, together with high proportions of B cells and monocytes, in SLE17. Patients with inflammatory bowel disease (IBD) showed lower levels of UTCs and ILCs18, and we observed lower proportions of UTCs accompanied by a larger fraction of monocytes and B cells in RA19. Lymphopenia, a common event during the development of sepsis20, and lymphocytosis, typical of HIV infection21, were also confirmed.
Fig. 1: Inflammation landscape of circulating immune cells.
a, Left, schematic overview illustrating the number of cells, samples and conditions (diseases and disease groups) analyzed. Right, pie charts displaying metadata related to the scRNA-seq chemistry (10x Genomics assay and version) and patient demographics (age and sex). b, Schematic overview of the analysis workflow followed, detailing the division of the overall dataset into Main, unseen patients and unseen studies. The figure illustrates the specific tasks and analyses performed with each dataset. c, Uniform manifold approximation and projection (UMAP) embedding for the scANVI-corrected latent space considering the Main dataset (4,435,922 cells) across patients and diseases colored by the major cell lineages (top, Level 1) and diseases (bottom). d, Sankey diagram showing the Inflammation Atlas cell annotation, considering major cell lineages (Level 1, left) and cell type populations (Level 2, right), along with their correlation to Level 1 cell groups. a,b, Icons were created in BioRender: Aguilar Fernandez, S. (2025): https://biorender.com/h7jfeqm or with Inkscape. CM, central memory; D, disease; DC, dendritic cell; DEG, differentially expressed genes; EM, effector memory; HC, healthy control; HVG, highly variable genes; Mono, monocytes; NK, natural killer; pDC, plasmacytoid dendritic cell; QC, quality control.
Diving deeper into genes and gene programs to characterize inflammatory diseases, our subsequent analysis followed three complementary strategies: (1) to identify disease-driving mechanisms (gene signature and gene regulatory network (GRN) activity); (2) to capture discriminative inflammation-related genes (feature extraction); and (3) to classify patients based on their disease-specific signatures (projection). Therefore, we looked at gene expression profiles holistically but also delineated the inflammatory process by focusing on immune-modulating molecules (Supplementary Table 4).
Inflammation-related signatures across diseases and cell types
We first grouped inflammatory molecules into 21 gene signatures that delineate multiple processes, including immune cell adhesion and activation, cellular migration (chemokines), antigen presentation and cytokine-related signaling (Supplementary Table 4). To tailor these signatures to reflect the inflammation landscape of circulating immune cells, we refined these using Spectra22, yielding a comprehensive set of 119 cell-type-specific factors (Supplementary Table 4). We then ran a univariate linear model (ULM) analysis on the scANVI-corrected gene expression data, providing an inflammation signature activity score for each group. Finally, we ran a linear mixed-effect model (LMEM) between diseased and healthy samples to highlight disease-specific alterations (Supplementary Table 5).
We observed a general trend of increased activity in immune-relevant signatures as compared to healthy donors (>50% increased average signature scores; Fig. 2a). For IMIDs, we found the characteristic upregulation of adhesion molecule signatures, TNF via NFκB signaling, antigen cross-presentation and antigen-presenting signatures23. Interferon (IFN) type 1 and type 2 signatures were significantly downregulated in most IMIDs and cell types, except for non-naive CD8 T cells that showed an upregulation, pointing to a common cell-type-specific mechanism24. Notably, IMIDs showed a strong upregulation of the IFN-induced signature in almost all immune cell types, where SLE was also accompanied by an upregulation of chemokines and chemokine receptors. MS showed a decreased IFN-induced signature and increased chemokine receptor activity, in line with the migratory capacity of blood cells to infiltrate the brain during the course of the disease25. As previously reported, we captured the upregulation of the TNF receptor/ligand signature mainly in non-naive CD8 T cells for sepsis (together with an increase in IFNγ response in monocytes), with a decrease in the other inflammatory signals (adhesion molecules and cytokines)26. By contrast, all chronic inflammatory diseases upregulated the activity of antigen-presenting molecules and increased IFN-induced signaling. This IFN-induced signature was also increased in viral infections, such as Flu and COVID, whereas we found a decreased activity in HIV and HBV. Finally, within solid tumors, CRC and NPC presented a strong upregulation of TNF via NFκB signaling. Intriguingly, only RA, PS, UC and CD showed an enrichment in the T follicular helper (Tfh) signature in non-naive CD4 T cells, highlighting the role of circulating Tfh cells in these diseases. In IMIDs more generally, both naive and non-naive CD4 T cell populations were enriched in T helper signatures, pointing to an early priming of naive T toward helper T cell-driven inflammation27. Finally, to assess the similarity of the inflammatory profiles among diseases, we performed hierarchical clustering of the inflammation signature activity score across all cell types (Level 1; Fig. 2a,b).
Fig. 2: Inflammation-related signatures across cell types and diseases.
a, Heatmap displaying the corrected signature activity score of the 119 cell-type-specific Spectra factors across diseases and cell types (Level 1). Here, the corrected signature activity score represents the coefficient value after running an LMEM comparing diseases versus healthy control (HC) on the ULM estimates computed using the cell type (Level1) and patient pseudobulk on the corrected count matrix. The xaxis represents the Spectra cell-type-specific factor associated with a given function (top annotation). The yaxis represents the diseases grouped by disease group. b, Agglomerative hierarchical clustering with complete linkage, performed considering the Euclidean distance among columns, based on the corrected immune-related signature activity score computed by disease and cell type (Level 1). c, Heatmap displaying the corrected IFN type 1 and type 2 signature activity score across non-naive CD8 T cells (Level 2) and IMIDs. Here, the corrected signature activity score represents the coefficient value after running an LMEM comparing diseases versus HC on the ULM estimates computed using the cell type (Level2) and patient pseudobulk on the corrected count matrix. For a and c, significant signature activity differences between disease and HC are marked with a dot (·) (LMEM, FDR-adjusted P < 0.05). d, Dot plot showing the uncorrected average expression of the FGFBP2 and GZMB genes from IFN type 1 and type 2 signature (xaxis) across different subpopulations of non-naive CD8 T cells (Level 2) on IMIDs and health (SCGT00 study). The dot size reflects the percentage of cells of each disease expressing each gene, and the color represents the average expression level. e, Scaled relative activity of STAT1 and SP1 across cell types (Level 1) and enriched diseases for their transcription factor target genes. Hatched boxes indicate cell types not enriched in the corresponding disease. f, Heatmap representing the average scaled transcription factor activity of STAT1 and SP1 across cell populations (Level 2) for flare and non-flare patients from Perez et al.17. Asterisk (*) indicates statistically significant changes using the two-sided Wilcoxon signed-rank test, FDR-adjusted P < 0.05. CD, Crohnʼs disease; CM, central memory; DC, dendritic cell; MLM, multilevel modeling; MS, multiple sclerosis; EM, effector memory; pDC, plasmacytoid dendritic cell; PS, psoriasis; PSA, psoriatic arthritis; RA, rheumatoid arthritis; TF, transcription factor; UC, ulcerative colitis.
Considering distinct cell types as unique contributors to the inflammatory immune landscape, IFN signatures have been used as a biomarker to define disease activity in autoimmune diseases28. However, it remains elusive which immune subpopulations contribute to these signatures to guide the selection of specific therapeutic interventions. Observing an enriched IFN type 1 and type 2 activity in non-naive CD8 T cells in IMIDs (Fig. 2a), we next sought to discover subpopulations as the signature driver. Here, we observed a significant upregulation across almost all non-naive CD8 T cell populations—however, with a differential pattern across diseases (Fig. 2c). We then decomposed the signal to gene level to identify the most relevant contributors (Supplementary Fig. 2a,b). Intriguingly, FGFBP2 and GZMB showed increased expression levels, with restriction to specific effector memory (EM) CD8 T cell subtypes (EM CX3CR1 high, EM CX3CR1 int, Eff HOBIT and Activated), with a marked increase observed in UC (Fig. 2d). Of note, FGFBP2 and GZMB were recently described as markers of CD8 T cells localized to areas of epithelial damage24. Notably, our blood-based analysis points to their activation in circulating effector CD8 T cell populations even before tissue infiltration.
Expanding on previous observations of increased IFN-induced response across several immune cells and diseases, especially in the myeloid compartment for patients with SLE17 (Fig. 2a), we conducted a GRN analysis to explore the regulatory mechanisms and transcription factors driving the IFN-related activity (Level1; Methods). STAT1 and SP1 were identified as the primary regulators of the IFN-induced signature, with each transcription factor exhibiting cell-type-specific activities (Fig. 2e and Supplementary Table 6). STAT1 primarily regulated canonical IFN signaling genes across multiple lineages, whereas SP1 activated a heterogeneous set of target genes (Extended Data Fig. 2a and Supplementary Table 6)29.
Observing a broad IFN-induced activity across immune cell types, we next investigated whether STAT1 and SP1 regulatory programs were conserved across cell subpopulations (Level 2; Extended Data Fig. 2b,c and Supplementary Table 6). Here, patients with SLE exhibited opposing STAT1 and SP1 activities in monocytes and non-naive CD8 T cells. STAT1 activity was increased in non-classical monocytes, whereas SP1 activity was decreased30. STAT1 was also upregulated in conventional dendritic cells type 2 (cDC2s), whereas SP1 activity was increased across multiple cell types implicated in the pathogenesis of SLE31, including inflammatory and regulatory monocytes, EM CX3CR1 high, CM and activated CD8 T cells as well as adaptive and CD56dimCD16 natural killer cells. Patients with Flu showed a significant increase in STAT1 activity in IFN-response CD8 T cells (Extended Data Fig. 2b). By contrast, patients with cirrhosis presented higher SP1 activity specifically in IFN-response monocytes. In HNSCC, an increased SP1 activity was observed in non-classical monocytes (Extended Data Fig. 2c), a protumoral population related to the suppressive systemic state of monocytes in this cancer type32. Given the cell-type-specific regulatory patterns observed across diseases, we next investigated the contribution of STAT1 and SP1 activity to dynamic changes associated with disease progression. To this end, we assessed their activity in patients with SLE17 experiencing disease exacerbations (flares; Supplementary Table 6). STAT1 activity was elevated during flares, particularly within CD8 T cells, whereas SP1 activity was more prominent in myeloid populations in the absence of flares (Fig. 2f).
Functional gene selection through interpretable modeling
Gene discovery using linear models or standard differential expression analysis suffers from the limitation that genes are considered independently. Thus, we considered the possibility of categorizing cells to their respective disease origin through an interpretable machine learning pipeline, to guide the selection of functional disease discriminatory genes (Methods and Supplementary Table 4). Therefore, we applied a supervised classification approach, together with a post hoc interpretability method, to allow the inference of the gene-wise importance, stratified by disease and cell type (Level 1).
We based our strategy on gradient boosted decision trees (GBDTs), a state-of-the-art machine learning technique proven to be effective in complex tasks with noisy data and nonlinear feature dependencies33 (Methods and Supplementary Table 7). To account for cell-type-specific expression patterns and the differential impact of diseases across immune populations, we trained separate models for each cell type (Level 1). We applied the classification pipeline to the scANVI-corrected gene expression profiles, achieving a balanced accuracy score (BAS) of 0.87 and a weighted F1 (WF1) score of 0.90 on held-out samples (Fig. 3a and Supplementary Table 8). Instead, uncorrected log-normalized counts led to a reduced performance, underscoring the benefits of batch correction (BAS: 0.65 and WF1: 0.78; Fig. 3a). Performances were consistent among cell types, with less abundant cell populations obtaining generally lower scores (for example, plasma cells, BAS: 0.78 and WF1: 0.80; Extended Data Fig. 3 and Supplementary Table 8). We observed that certain diseases exhibited poorer classification performance—for example, the misclassification of patients with severe Flu as COVID (Extended Data Fig. 3). Retraining the GBDT classifier on the Flu and COVID dataset (COMBAT dataset34) and stratifying patients with COVID by their clinical behavior (mild, severe and critical) identified patients with severe Flu to closely resemble severe COVID cases (Extended Data Fig. 4a,b). Similar results were obtained by clustering pseudobulks at the sample level (Extended Data Fig. 4c,d), supporting common inflammatory signatures of patients suffering from these severe respiratory infections. Finally, separating cells from female and male patients yielded similar performances, with no differences between sexes (Extended Data Fig. 5a).
Fig. 3: Functional gene discovery using interpretable machine learning.
a, Normalized confusion matrices displaying proportion of predictions belonging to each true condition. Diagonal values correspond to the Recall metric. XGBoost was trained on the scANVI batch-corrected (left) or batch-uncorrected (right) log-scaled cell expression profiles. b, Validation of d-SHAP-based gene selection using XGBoost trained with a nested cross-validation on unseen studies’ cells. Each point corresponds to the average left-out fold performance, for each best configuration of each fold combination. The box plots report the WF1 (top) and the BAS (bottom) computed considering top 5, 10 and 20 genes (among the ones expressed in at least 5% of the total cells), for each inflammatory condition present within the unseen studies dataset (that is, healthy, sepsis, CD, SLE, HIV, cirrhosis, RA and COVID) according to the d-SHAP values, across cell types (Level 1). For the same number of genes, we report the performance scores of n = 20 random selected gene sets. The performance of the classifier when trained on the whole gene set, consisting of the genes expressed in at least 5% of the total cells, is also reported. Boxes indicate the interquartile range (IQR) with the median as a center line; whiskers extend to 1.5× IQR; and outliers are shown as individual points. c, Scatter plot of max-normalized gene expression against d-SHAP values computed for CYBA gene on monocyte population (Level 1) and considering the output of disease-XGBoost for a given disease (UC, CD, PS and PSA, from left to right). d, Scatter plot of max-normalized gene expression against d-SHAP values computed for IFITM1 gene on T non-naive CD4 and ILC populations (annotation Level 1) considering the output of the disease-XGBoost for a given disease (asthma and COPD, left and right). In c and d, we limited the visualization to up to 60,000 cells, sampling an equal percentage from each patient corresponding to 5% and 7.5% of monocytes and T non-naive CD4 cells, respectively. Cells belonging to samples with or without the given condition (disease) are marked in orange or blue, respectively. CD, Crohnʼs disease; MS, multiple sclerosis; PS, psoriasis; PSA, psoriatic arthritis; RA, rheumatoid arthritis; UC, ulcerative colitis.
As GBDTs require post hoc interpretability tools, we computed SHapley Additive exPlanation (SHAP)35 values. By combining the two approaches, we obtained a rich resource of gene rankings based on their ability to discriminate inflammatory conditions across different cell types (Methods and Supplementary Table 9). To evaluate the effectiveness of disease-discriminative SHAP (d-SHAP) values, we assessed the classification performance compared to an equal number of randomly selected genes. On unseen studies, d-SHAP genes consistently yielded more accurate predictions (Fig. 3b). Due to the possible collinearity of diseases and studies, d-SHAP values might be affected by batch effects. To disentangle disease-specific from study-specific signals, we trained separate classifiers to predict the study identity (BAS: 0.97 and WF1: 0.99; Supplementary Fig. 4a) and to identify study-associated genes via SHAP values (s-SHAP; Methods). The correlation and overlap between the d-SHAP and s-SHAP values (Supplementary Fig. 4b,c) allowed us to prioritize bona fide disease-discriminative genes for further analysis (Supplementary Table 9).
Ordering genes based on d-SHAP values identified previously described biomarkers, such as STAT3 in CD4 T cells for RA samples36 and IFN genes for SLE samples37 (Extended Data Fig. 6a). The d-SHAP values of CYBA stood out as a strong candidate marker to classify diseases affecting barrier tissue: PSA, PS, UC and CD (Fig. 3c and Extended Data Fig. 6b,c). CYBA encodes the primary component of the microbicidal oxidase system of phagocytes. In line, the importance was seen mainly in monocytes (Extended Data Fig. 6b). Interestingly, high expression of CYBA drove the model to classify intestinal inflammatory diseases (UC and CD), whereas reduced levels were relevant to classify skin-related diseases (PS and PSA) (Fig. 3c). Mutations in CYBA cause chronic granulomatous disease, with patients showing an impaired phagocyte activation and failing to generate superoxide. Consequently, patients show recurrent bacterial and fungal infections in barrier tissues, including the skin38. Thus, we hypothesize that reduction of CYBA in skin-related IMIDs leads to an impaired immune barrier function, causing localized, symptomatic flares of PS and PSA. On the other hand, reactive oxidative species (ROS) produced by mucosa-resident cells or by newly recruited innate immune cells are essential for antimicrobial mucosal immune responses39. In IBDs, an upregulation of CYBA may result in the accumulation of superperoxide and ROS through its oxidase function, a hallmark of these diseases40.
Further exploring d-SHAP value ranks highlighted the importance of IFITM1 across chronic diseases, including COPD and asthma (Extended Data Fig. 6d,e). IFITM1 encodes a protein that inhibits viral entry into host cells by preventing the fusion of the virus with the host cell membrane41. The importance of IFITM1 was mainly observed in lymphoid cells, specifically CD4 non-naive T cells and ILCs (Extended Data Fig. 6d and Supplementary Fig. 3). In both cell types, higher IFITM1 expression drives the model toward classifying COPD, whereas lower expression shifts the classification toward asthma (Fig. 3d). In line, T cell and ILC accumulation is associated with the decline of lung function and severity in patients with COPD42. We hypothesize that chronic inflammation triggers higher expression of IFITM1 in lymphoid cells, thereby facilitating their accumulation43, with further mechanistic validation being needed.
Classifying patients by reference mapping
The ability to accurately classify cells according to their respective diseases prompted us to classify patients based on their disease of origin, creating the basis for a universal classifier as a precision medicine tool for inflammatory diseases. By considering each patient as an ensemble of expression profiles across all circulating immune cells, we learned a generative model while integrating the single-cell reference as a basis to project new patients from a query dataset into the same embedding space. Such strategy allowed us to map unseen and unlabeled query patient data into our reference embedding space, providing a common ground for classification.
Projecting expression data into a lower-dimensional space is a common strategy to reduce noise44 and to map query data into a reference atlas45. Here we introduce a novel computational framework to exploit the cell embeddings for classification, thus turning the single-cell reference into a diagnostic tool (Fig. 4a and Extended Data Fig. 7). Therefore, we first generated the embeddings with scANVI (30 latent embeddings) of both the reference and the unseen query datasets while also transferring the cell labels to the latter (Supplementary Table 7). Then, we defined a cell type pseudobulk profile per patient by averaging the embedded features of the corresponding cells (Level 1; Methods). Next, we trained an independent classifier to assign correct disease labels, considering one cell type at a time. We handled uncertainty at cell type level via a majority voting system to determine most frequent conditions. To assess the performance of our framework, we proposed three scenarios: (1) a five-fold cross-validation splitting the full reference atlas into five balanced sets, (2) a dataset with unseen patients and (3) a dataset with unseen studies (Fig. 4b). We consider these scenarios a representation of the data integration challenges with an increasing degree of complexity.
Fig. 4: Schematic representation of the patient classifier pipeline and performance evaluation.
a, Schematic representation of the patient classifier pipeline. Icons were created with Inkscape. b, Description of the three performance evaluation scenarios. In our datasets, we always have only one sample for each patient. c−e, Performance evaluation in Scenario 1 (five-fold cross-validation, from 817 samples), showing: c, distribution of WF1 scores for each left-out split (boxes indicate the interquartile range (IQR) with the median as a center line, whiskers extend to 1.5× IQR and outliers are shown as individual points (each box includes n = 5 points)); d, F1 score for each combination of cell type and disease, after aggregating all the predictions of the left-out folds; and e, normalized confusion matrices displaying proportion of predictions belonging to each true condition after aggregating all the predictions of the left-out folds. Main diagonal values correspond to the Recall metric. f,g, Performance evaluation in Scenario 2, showing WF1 scores for unseen patients’ observation (f) and F1 score for each combination of cell type and disease (g). h, Normalized confusion matrices displaying proportion of predictions belonging to each true condition. Main diagonal values correspond to the Recall metric. i−k, Performance evaluation in Scenario 3, showing: i, WF1 scores for unseen studies’ observation; j, F1 score for each combination of cell type and disease; and k, normalized confusion matrices displaying proportion of predictions belonging to each true condition. Main diagonal values correspond to the Recall metric. CD, Crohn’s disease; DC, dendritic cell; MS, multiple sclerosis; P, patient; pDC, plasmacytoid dendritic cell; PS, psoriasis; PSA, psoriatic arthritis; QC, quality control; RA, rheumatoid arthritis; UC, ulcerative colitis.
Our classification strategy achieved high performance in the cross-validation scenario (Scenario 1; Supplementary Table 8), resulting in 0.90 ± 0.03 WF1 and 0.85 ± 0.07 BAS (Fig. 4c). Consistent with results obtained from the cell-wise classifier pipeline, Flu was the only disease that failed to be classified (Recall: 0.18) (Extended Data Fig. 8a,b). Training a classifier for each cell type separately allowed us to assess their relevance in distinguishing inflammatory diseases (Fig. 4d,e). Here, plasma and UTC showed the lowest BAS (0.53 and 0.67) and WF1 (0.64 and 0.78), highlighting the strength of our majority voting approach as a robust ensemble (Extended Data Fig. 8b). Although certain diseases (COVID, COPD and asthma) were particularly well classified by lymphoid and myeloid cell types, HIV was best classified by naive lymphoid cells (that is, naive CD4 and CD8 T cells and B cells with F1 of 0.83) in line with the tropism of the virus infecting mainly CD4 T cells46,47 (Fig. 4d and Extended Data Fig. 8c). Increasing the complexity by classifying unseen patient samples (Scenario 2), the performance remained very high, with a BAS of 0.95 and a WF1 of 0.98 (Fig. 4f−h and Supplementary Table 8). However, the classification of samples from unseen studies (Scenario 3) resulted in a strongly decreased BAS of 0.12 and a WF1 of 0.23 (Fig. 4i−k and Supplementary Table 8).
The largest performance drop was observed between Scenario 2 and Scenario 3, the latter classifying patients from unseen studies. We hypothesized that confounding factors, such as variations in assay chemistry or research centers, hindered the classifier’s ability to generalize. To validate our hypothesis and to provide a path toward a generalizable patient classifier, we next considered a Centralized Dataset that includes only data from diseases generated in the same center with a single assay chemistry (SCGT00 data; Supplementary Table 1 and Extended Data Fig. 7). In contrast to Scenario 2, we stratified the samples by sequencing pool and disease, ensuring that reference and query patients belong to distinct cohorts. This new centralized approach included an independent annotation of the reference patients’ cells (Methods and Supplementary Table 3) and new scANVI integration of the reference data, before projecting cells of the query patients. Notably, in this context, WF1 and BAS increased to 0.56 and 0.53, respectively, pointing to a highly improved generalization performance when classifying query patients as compared to Scenario 3 (Fig. 5a−c, Extended Data Fig. 9a,b and Supplementary Table 8). Finally, we evaluated the classifier performance considering male and female patients separately. In Scenario 1, no statistically significant differences were observed between WF1 distributions (Extended Data Fig. 5b), and the majority vote approach also yielded consistent results in the other scenarios (Extended Data Fig. 5c–e).
Fig. 5: Evaluating patient classifier performance on a Centralized Dataset and comparison with the state-of-the-art data integration approaches.