Main
Interpreting the impact of genome sequence variation remains a central biological challenge. Non-coding variants, which reside outside of protein-coding regions, are particularly challenging to interpret because of the diverse molecular consequences they can elicit. For example, non-coding variants can modulate genome properties such as chromatin accessibility, epigenetic modifications and three-dimensional chromatin conformation. Variants can further influence messenger RNA (mRNA) availability by altering expression levels or modifying sequence composition through splicing changes. Additionally, variants can exhibit cell-type-specific or tissue-specific effects. Given that more than 98% of observed genetic variation in humans is non-coding[7](https://www.nature.com/article…
Main
Interpreting the impact of genome sequence variation remains a central biological challenge. Non-coding variants, which reside outside of protein-coding regions, are particularly challenging to interpret because of the diverse molecular consequences they can elicit. For example, non-coding variants can modulate genome properties such as chromatin accessibility, epigenetic modifications and three-dimensional chromatin conformation. Variants can further influence messenger RNA (mRNA) availability by altering expression levels or modifying sequence composition through splicing changes. Additionally, variants can exhibit cell-type-specific or tissue-specific effects. Given that more than 98% of observed genetic variation in humans is non-coding7, global characterization of the complex effects of this vast majority of variants remains intractable without computational predictions.
Computational methods can learn patterns from experimental data to predict and explain variant effects. One class of methods,sequence-to-function models1,2,3,4,5, takes a DNA sequence as input and predicts genome tracks, a data format associating each DNA base pair with a value (representing read coverage, count or signal) derived from experimental assays performed in cell lines or tissues. Genome tracks span various data modalities measuring gene expression (with output types comprising RNA sequencing (RNA-seq), cap analysis of gene expression (CAGE) sequencing, and precision nuclear run-on analysis of capped RNA (PRO-cap)), splicing (splice sites, splice site usage and splice junctions), DNA accessibility (DNase I hypersensitive site sequencing (DNase-seq)) and assay for transposase-accessible chromatin sequencing (ATAC-seq)), histone modification (chromatin immunoprecipitation sequencing (ChIP-seq)), transcription factor binding or chromatin conformation (high-throughput chromosome (Hi-C) or micrococcal nuclease-based (Micro-C) conformation capture). Successfully trained sequence-to-function models accurately predict experimental measurements from input sequences. Furthermore, by comparing genome track predictions from an alternative sequence versus a reference sequence, these models can predict the molecular effects of variants.
Currently, deep learning-based sequence-to-function models face two fundamental trade-offs constraining their ability to predict how variants affect diverse modes of biological regulation. First, often owing to computational limitations, models must trade off between capturing long-range genomic interactions and achieving nucleotide-level predictive resolution. Although models such as SpliceAI4, BPNet8 and ProCapNet9 provide base-resolution predictions, they are restricted to short input sequences (for example, 10 kb or less), and thus may miss the influence of distal regulatory elements. Models such as Enformer1 and Borzoi2 can process longer sequences (approximately 200–500 kb) to capture broader context but at the cost of reducing output resolution (128-bp or 32-bp bins), which can blur fine-scale regulatory features such as splice sites, transcription factor footprints or polyadenylation sites.
A second trade-off exists between capturing diverse modalities versus specializing in one or a few. Several state-of-the-art (SOTA) models are highly specialized for single modalities, such as SpliceAI4 for splice site prediction, ChromBPNet10 for local chromatin accessibility and Orca3 for three-dimensional genome architecture. However, specialized models alone are insufficient for capturing the diverse molecular consequences of variants across modalities. Even within a single modality like splicing, specialized models such as SpliceAI4 or Pangolin11 predict certain aspects (such as splice site prediction) while omitting others (such as splice junction prediction or competition between splice sites). Models like DeepSEA, Basenji, Enformer, Sei and Borzoi have demonstrated the utility and practicality of multimodal models. They allow users to use a single model for several modalities, instead of requiring several specialized models. Furthermore, their learned general sequence representation enables them to be readily fine-tuned for new tasks. However, these more generalist models can lag behind their specialized counterparts on certain tasks, such as splicing, or may lack particular modalities, such as contact maps.
Here we present AlphaGenome, a model that unifies multimodal prediction, long-sequence context and base-pair resolution into a single framework. The model takes 1 Mb of DNA sequence as input and predicts a diverse range of genome tracks across numerous cell types. The splicing predictions of AlphaGenome include a new splice junction prediction approach alongside splice site usage prediction. We evaluated the performance of AlphaGenome using a comprehensive set of benchmarks, covering both its ability to accurately predict genome tracks on previously unseen DNA sequences and its effectiveness in variant effect prediction tasks. AlphaGenome achieved SOTA performance on 22 of 24 genome track prediction tasks and 25 of 26 variant effect prediction tasks. We performed extensive ablations of target resolution, sequence length, distillation and modality combinations to explain the performance of AlphaGenome and inform design choices for future sequence-to-function models. We envisage that AlphaGenome will provide a powerful and extensible foundation for analysing the regulatory code within the genome.
We first present key technical details of the AlphaGenome data and training procedure, alongside a high-level summary of our evaluations (Fig. 1). We then demonstrate high-fidelity genome track prediction performance, a prerequisite for variant effect prediction (Fig. 2). Next, we focus on variant effect prediction with modality-specific deep dives into splicing (Fig. 3), gene expression (Fig. 4) and chromatin accessibility (Fig. 5). Finally, we highlight the model’s utility in cross-modality variant interpretation (Fig. 6) and dissect the impact of modelling choices on the performance of AlphaGenome (Fig. 7).
Fig. 1: AlphaGenome model architecture, training regimes and comprehensive evaluation performance.
a, Model overview. AlphaGenome processes 1 Mb of DNA sequences and species identity (human/mouse) to predict 5,930 human or 1,128 mouse genome tracks across diverse cell types and 11 output types at specific resolutions (far right). Computation leverages sequence parallelism, breaking the 1 Mb of DNA sequence into 131-kb chunks processed across devices. The core architecture features a U-Net-style design comprising an encoder (downsampling the sequence), transformers with inter-device communication and a decoder (upsampling), which feed into task-specific output heads at their respective resolutions (detailed in Extended Data Fig. 1). b, The pretraining process, in which 1-Mb DNA intervals are sampled from cross-validation folds, augmented (shifted and reverse complemented) and used to train the model against experimental targets, yields fold-specific and all-fold teacher models. c, The distillation process, in which a student model learns to reproduce predictions from frozen all-fold teacher models using augmented and mutationally perturbed input sequences, yields a single model suitable for variant effect prediction. d, Track prediction: pretrained fold-split model. Relative performance improvement (%) of AlphaGenome over the best competing model for a selection of genome track prediction tasks across modalities and resolutions (Supplementary Table 3). The ‘value’ column represents the absolute performance of AlphaGenome. For all tasks shown, a value of 1.0 indicates perfect performance, with the exception of ‘profile JSD’, for which the ideal value is 0. Both competing models and AlphaGenome pretrained fold-split models were evaluated on held-out genome regions unseen during model training. For classification tasks, we adjusted the relative improvement to account for the performance of a random classifier (Methods). e, Variant effect prediction: distilled all-fold model. Relative performance improvement of AlphaGenome over the best competing model for a subset of variant effect prediction tasks (Supplementary Table 4). The distilled student AlphaGenome model is used for these evaluations. The ds/caQTL direction (causality) rows represent the average relative improvement across several similar datasets (Methods). ds, DNase sensitivity; ca, chromatin accessibility; JSD, Jensen–Shannon divergence.
Fig. 2: Example of AlphaGenome track predictions and detailed performance evaluations.
a, Observed and AlphaGenome-predicted genome tracks within a 1-Mb held-out region of human chr. 19 (0-based coordinates: 10587331–11635907) in the HepG2 cell line. The y-axis scales for each assay are defined in the Methods section. Strand-specific tracks are denoted as positive (+) or negative (−), whereas strand-agnostic tracks are shown without a strand symbol. Contact maps are pairwise interaction matrices; therefore, both x and y axes display genome coordinate positions. RNA-seq, ATAC-seq and DNase-seq track predictions are at 1-bp resolution; H3K27ac and CTCF ChIP-seq are at 128-bp resolution; and contact maps are at 2,048-bp resolution. b, Example predictions with splicing. Base-pair-resolution AlphaGenome predictions for a 50-kb region highlighting detailed splicing (donor/acceptor sites, splice site usage and splice junctions) and RNA-seq predictions around the LDLR gene. c, Track prediction performance evaluation across different modalities. Violin plots display the distribution of Pearson correlations between predicted and observed tracks evaluated on held-out test intervals. Each violin plot is grouped by modality and split by organism (human in red; mouse in blue). Filled circles with accompanying numerical values indicate the mean Pearson r per assay group and organism. Splice junction, RNA-seq, PRO-cap, CAGE and ChIP-seq tracks were log(1 + x) transformed, whereas the remainder were untransformed. d, Evaluation of RNA-seq gene log-expression prediction on held-out test intervals. The leftmost panel assesses the Pearson correlation between predicted and observed log-expression values across all genes within individual tracks. The middle and rightmost panels evaluate the prediction of tissue or cell-type specificity using quantile-normalized expression values (detailed in Methods); correlations are computed either across genes per track (middle) or across tracks per gene (right). e, Splice junction count prediction. Predicted versus observed splice junction read counts (log(1 + x) transformed; n = 1,344,738) and Pearson r between them in selected human tissues known for having distinct splicing patterns49. Each hexagonal bin is coloured by the density of the data points in that bin, with warmer colours corresponding to higher density. The diagonal dotted line indicates perfect agreement (predicted = observed). More tissues are shown in Extended Data Fig. 2d. Obs., observed; Pred., predicted.
Fig. 3: AlphaGenome is a SOTA splicing variant effect prediction model.
a, Comparison of prediction outputs across deep learning models. All models predict at 1-bp resolution, except Borzoi (32 bp). Borzoi predicts splice sites implicitly through RNA-seq coverage, whereas others produce explicit predictions. b, Variant causing exon skipping in DLG1 (GTEx artery tibial tissue). Predicted splice junction, site usage and RNA-seq coverage are shown alongside observed coverage for reference (REF; blue) and alternative (ALT; red) alleles. c, New splice junction variant in COL6A2 (aorta), creating a new splicing donor and disrupting the extant one. d, ISM of U2SURP exon 9 and flanking introns using the mean splice junction score across tissues. Splicing-related motifs are highlighted. e, Schema of splice variant effect prediction with AlphaGenome. The maximum difference between REF and ALT predictions across splice sites or splice junctions is used to score variants (Methods). f, Comparison of AlphaGenome composite and splice junction scorers versus other methods for classifying fine-mapped sQTL variants. Variants are stratified into two groups by distance to the splice site, as done in Borzoi2. Tissue-specific auPRCs were averaged and weighted by variant count per tissue. g, Prediction of rare variants associated with splicing outliers. AlphaGenome was evaluated in both zero-shot and supervised settings (training an ensemble model similar to AbSplice50). h, Classifying pathogenic versus benign ClinVar variants on the basis of splicing effects for deep intronic (more than 6 bp from splice sites) and synonymous (more than 3 bp from splice sites) variants, variants in the splice site region (within 6 bp intronic or 3 bp exonic) and missense variants predicted as ‘likely_benign’ by AlphaMissense51. i, MFASS splicing variant classification (MPRA-tested variants). auPRC on the classification of experimentally validated splice-disrupting variants (data from Chong et al.22). #Hom/#Het, number of homozygous/heterozygous samples in GTEx.
Fig. 4: AlphaGenome predicts the effect of variants on gene expression.
a, RNA-seq variant scoring. Variant scoring strategy for predicting the effect of a genetic variant on the expression of a target gene (Methods). b, Example predictions for a known eQTL (chr. 22: 36201698: A>C) in GTEx colon (sigmoid) tissue. The observed RNA-seq coverage is the average across GTEx samples homozygous for either allele. Inset, comparative ISM on reference and alternative sequences over a 20-bp window centred on the variant (Methods). c, Comparison of performance (Spearman’s ρ) at predicting the effect size of eQTLs across 49 GTEx tissues (‘coefficient’) for different models and variant sets. d, Comparison of AlphaGenome-predicted variant scores and observed effect sizes (SuSiE β posterior) for 17,675 fine-mapped GTEx eQTLs (SNVs). Each point is a unique variant/gene/tissue combination. Spearman’s ρ (signed) = 0.50; Spearman’s ρ (unsigned; absolute values) = 0.10. Pearson’s r (signed) = 0.39; Pearson’s r (unsigned; absolute values) = 0.20. e, Comparison of performance (auROC) at predicting the direction of effect of eQTLs (‘sign’) for different models and variant sets. f, eQTL sign prediction performance stratified by different variant-to-TSS distance bins (SNVs only). g, Relationship between sign accuracy and eQTL recall. For a series of variant score thresholds, we plotted the fraction of GTEx eQTLs with a score above the threshold (y axis) and sign accuracy achieved (x axis) on those variants. h, Coverage of predictions across GWAS loci. Fraction of GWAS credible sets (from Open Targets52) with a predicted direction of effect for a plausible target gene, comparing AlphaGenome predictions to the eQTL co-localization approach. Top, each bar represents a different strategy for summarizing AlphaGenome scores, and two different score thresholds that yielded a given sign accuracy on eQTL of 80% or 90% (Methods). For COLOC, we counted a credible set as resolved if H4 > 0.95. Bottom, using the AlphaGenome strategy of PIP-weighting (80%), credible sets were further stratified by different properties (Methods). i, Comparison of performance (auROC) at distinguishing causal from non-causal eQTLs (‘causality’) using both zero-shot and supervised approaches (Methods). j, Enhancer–gene linking performance (ENCODE–rE2G CRISPRi dataset12). Zero-shot evaluation: performance (auPRC) comparison stratified by enhancer-to-TSS distance. Supervised evaluation: AlphaGenome input gradient score integrated into ENCODE–rE2G extended and ENCODE–rE2G models. k, Performance (auPRC) of paQTL variant effect prediction, thresholded by distance from the polyadenylation site. Each swarm plot represents 100 permutations of randomly matching each positive SNP with one of its distance and expression-matched negatives (Methods). Larger dots are the mean. RF, random forest.
Fig. 5: AlphaGenome accurately predicts variant effects on chromatin accessibility and SPI1 transcription factor binding.
a, Schematic of the centre-mask variant scoring strategy used for accessibility and ChIP-seq predictions (Methods). b,c, Performance comparison of AlphaGenome, Borzoi and ChromBPNet on QTL causality (b; average precision) and QTL effect size (c; Pearson r) across QTL types and ancestries. d, Predicted versus observed effect sizes for causal caQTLs (African ancestry). The scatterplot displays GM12878 cell line DNase predictions. Signed Pearson r = 0.74; unsigned Pearson r = 0.45. Signed Pearson r correlation uses raw values; unsigned Pearson r uses absolute values. Red and blue circles highlight variants in e and** f**. e, Example ALT–REF differences in predicted DNase (GM12878) for variants in d. f, ISM-derived sequence logos for REF/ALT alleles from e, suggesting variant disruption or modulation of transcription factor binding motifs. Putative binding factors and JASPAR53 matrix IDs (MA0105.1 and MA0105.3) are indicated on the right. g, Predicted versus observed effect sizes for causal SPI1 bQTLs using the GM12878 SPI1 ChIP-seq track. Signed Pearson r = 0.55; unsigned Pearson r = 0.12. Red and blue circles highlight variants in h and** i**. h, Example AlphaGenome predictions for selected SPI1 bQTLs. Shown are ALT–REF differences in predicted SPI1 ChIP-seq track (GM12878) around the variants highlighted in g. i, ISM-derived sequence logos for REF and ALT alleles of example SPI1 bQTLs from h, suggesting potential impacts such as creation or disruption of SPI1 or related motifs. The putative binding factors and JASPAR matrix IDs (MA0081.2 and MA0080.5) are indicated on the right. j, CAGI5 MPRA challenge performance (average across loci; mean Pearson r). Top, zero-shot using cell-type-matched DNase; middle, LASSO regression using cell-type-matched or agnostic DNase; bottom, LASSO regression using multimodal features (DNase + RNA + histone ChIP-seq output types for AlphaGenome and Borzoi; DNase + CAGE output types for Enformer) and all cell types. TF, transcription factor.
Fig. 6: Interpreting variant effects across modalities with AlphaGenome.
a, Non-coding cancer mutations in T-ALL. Overview of groups of mutations affecting TAL1 in patients with T-ALL. b, Detailed ALT–REF predictions for an oncogenic insertion (chr. 1: 47239296: C>ACG) characterized in ref. 6. Shown are differences between AlphaGenome predictions between the ALT and REF sequences of the variant in CD34+ CMP tracks. The ALT sequence increases expression of the TAL1 gene 7.5 kb away. c, Predicted TAL1 expression change (ALT–REF) in CD34+ CMPs. RNA-seq variant scores for TAL1 expression in CD34+ CMPs. Oncogenic mutations (orange) are compared with randomly sampled, length-matched indels (grey). d, Multimodal heat map of predicted variant effects. Each column is a distinct variant from c. Each row is a variant effect score associated with a genome track in CD34+ CMPs, except for contact map variant effect scores, which were averaged across tissues (as there is no CD34+ CMP contact map in our data). Background mutations are included alongside oncogenic mutations. Variants were grouped by their insertion length and position (as displayed in Fig. 6c), and scores were min-max scaled. e, ISM results for DNase, H3K27ac and TAL1 RNA-seq expression prediction by AlphaGenome in CD34+ CMPs. Top, ISM on the reference sequence; bottom, ISM on the oncogenic insertion sequence (chr. 1: 47239296: C>ACG). Myb motif from a previous study6, originally from UniPROBE54. f, Multimodality in trait-altering non-coding variants. Fraction of trait-affecting variants55 (‘candidate causal’; 338 for Mendelian and 1,140 for complex traits), as well as matched control variants55 (‘control’; 3,042 and 10,260, respectively), which exceed varying quantile-score thresholds in at least one predicted track. Here, surpassing a quantile-score threshold of 1.0 implies a predicted effect in excess of 99% of common variants (Methods). Variants are categorized depending on the tracks where the threshold was passed: ‘local regulation’ (ChIP/DNase/ATAC), ‘expression only’ (RNA/CAGE) and ‘multimodal’ (combination of the above). Numbers above the bars indicate the relative enrichment of detected variants (sum of the three categories) among candidate causal variants compared with the control variants. The enrichment increases with stricter thresholds, with a reduction in recall (x axis).
Fig. 7: Impact of resolution, sequence length, ensembling, distillation and multimodal training on AlphaGenome performance.
Ablation studies evaluating key model design choices across various performance metrics (y axis). For all panels, lines represent the mean over replicate training runs with different random seeds (n = 4 unless otherwise stated), and shaded contours denote the uncertainty interval (two standard deviations). a, Impact of target resolution. Performance comparison across models trained to predict targets (DNA accessibility, gene expression and splicing) at varying resolutions (x axis; 1–128 bp). b, Impact of sequence length during training and inference. Blue dots represent a single set of models trained with 1-Mb input, evaluated using varying input sequence lengths (x axis). Purple crosses represent models trained at the sequence length indicated on the x axis but evaluated at a fixed 1-Mb input length. Green triangles represent models trained and evaluated using the same matched sequence length (x axis). c, Impact of the number of sub-models in ensembling and distillation. Performance comparison for mean ensembles of pretrained models (blue dots/contours; x axis indicates ensemble size) versus single models produced by distillation using 1, 4 or 64 teacher models (orange crosses/contours; x axis indicates number of teachers). d, Impact of multimodal learning. Performance comparison evaluating models trained only on specific modality groups (blue dots; n = 8 seeds per group, highlighted in green if the modality matches the evaluation metric) against the full multimodal model (black dashed line; n = 4 seeds average). During training for these models, we ensured that only the target modality group’s prediction heads contributed updates to the shared representations, allowing assessment of that modality group’s contribution to overall model performance. Groups shown (x axis) include models trained using gradients only from accessibility (ATAC, DNase and contact maps), expression (RNA-seq, CAGE and PRO-cap), splicing (sites, usage and junctions) or histone ChIP-seq.
Unifying DNA sequence-to-function model
AlphaGenome is a deep learning model designed to learn the sequence basis of diverse molecular phenotypes from human and mouse DNA (Fig. 1a). It simultaneously predicts 5,930 human or 1,128 mouse genome tracks across 11 modalities covering gene expression (RNA-seq, CAGE and PRO-cap), detailed splicing patterns (splice sites, splice site usage and splice junctions), chromatin state (DNase, ATAC-seq, histone modifications and transcription factor binding) and chromatin contact maps. These span a variety of biological contexts, such as different tissue types, cell types and cell lines (see Supplementary Table 1 for the summary and Supplementary Table 2 for the complete metadata). These predictions are made on the basis of 1-Mb of DNA sequence, a context length designed to encompass a substantial portion of the relevant distal regulatory landscape. For instance, 99% (465 of 471) of validated enhancer–gene pairs fall within 1 Mb (ref. 12).
AlphaGenome uses a U-Net-inspired2,13 backbone architecture (Fig. 1a and Extended Data Fig. 1a) to efficiently process input sequences into two types of sequence representations: one-dimensional embeddings (at 1-bp and 128-bp resolutions), which correspond to representations of the linear genome, and two-dimensional embeddings (2,048-bp resolution), which correspond to representations of spatial interactions between genomic segments. The one-dimensional embeddings serve as the basis for genomic track predictions, whereas the two-dimensional embeddings are the basis for predicting pairwise interactions (contact maps). Within the architecture, convolutional layers model local sequence patterns necessary for fine-grained predictions, whereas transformer blocks model coarser but longer-range dependencies in the sequence, such as enhancer–promoter interactions. Base-pair-resolution training on the full 1-Mb sequence is enabled through sequence parallelism across eight interconnected tensor processing unit (v3) devices. Genomic track predictions are linear transformations of these sequence embeddings, aside from splice junction count prediction, which uses a separate mechanism that captures interactions between one-dimensional embeddings of donor–acceptor pairs (Extended Data Fig. 1).
We trained the model using a two-stage process: pretraining and distillation. The pretraining phase (Fig. 1b) used the observed experimental data to produce two types of models. Fold-specific models were trained using a 4-fold cross-validation scheme (Methods), with three fourths of the reference genome used for training and the remaining one fourth held out for validation and testing. These models were then used to evaluate the generalization of AlphaGenome by predicting genomic tracks on unseen (test) reference genome intervals (Fig. 1b). Additionally, all-fold models were trained on all available intervals of the reference genome and served as teachers in the second stage (distillation; Fig. 1c). In the distillation phase, a single student model, sharing the pretrained architecture, was trained to predict the output of an ensemble of all-fold teachers using randomly augmented input sequences (Methods). This distilled student model, as shown previously14, achieved improved robustness and variant effect prediction accuracy in a single model instance, making predictions across all modelled modalities and cell types with a single device call per variant. Taking less than 1 s on an NVIDIA H100 GPU, the student model is highly efficient for large-scale variant effect prediction relative to the alternative approach of ensembling several independently trained models.
Performance overview
To characterize the model performance of AlphaGenome, we first assessed its generalization to unseen genome intervals, a prerequisite for high-quality variant effect prediction. We conducted 24 genome track evaluations, encompassing all 11 predicted modalities (Methods and Supplementary Table 3). For out-of-fold evaluations, pretrained, fold-specific AlphaGenome models were used and compared with the strongest available external model for each respective task. AlphaGenome outperformed these external models on 22 of 24 evaluations (Fig. 1d, Extended Data Fig. 3 and Supplementary Table 3). Notably, AlphaGenome exhibited a +14.7% relative improvement in cell-type-specific gene-level expression log-fold change prediction compared with Borzoi2, another multimodal sequence model (Fig. 1e and stratified metrics in Extended Data Fig. 3e). AlphaGenome also outperformed specialized single-modality models on their respective tasks, such as Orca3 on contact maps (contact map Pearson r +6.3%; cell-type-specific differences +42.3%; Fig. 1d and Extended Data Fig. 4), ProCapNet9 on transcription initiation tracks (+15% total counts Pearson r; Extended Data Fig. 3f) and ChromBPNet10 on accessibility (+1.6% for ATAC; +9.5% for DNase profile Jensen–Shannon divergence; Extended Data Fig. 3g).
We next evaluated the model’s performance on predicting variant effects. We assembled a second set of 26 variant effect prediction benchmarks across gene expression, splicing, polyadenylation, enhancer–gene linking, DNA accessibility and transcription factor binding. Again, we compared with the strongest externally available model on each task (Methods and Supplementary Table 4). For variant effect prediction, we used the distilled student model. AlphaGenome matched or outperformed the external models on 25 of 26 evaluations (Fig. 1e and Supplementary Table 4). This included strong performance in quantitative trait locus (QTL) evaluations, such as sign prediction for expression QTLs (eQTLs; +25.5% versus Borzoi2) and accessibility QTL (+8.0% versus ChromBPNet10, averaged across five datasets; Methods), demonstrating its strength against both multimodal and specialized single-modality baselines. Collectively, these results demonstrate that AlphaGenome more accurately models both genome tracks and variant effects.
Improved track prediction performance
Given the strong performance of AlphaGenome on genome track evaluations, we investigated its track predictions in more detail. Fold-specific, pretrained AlphaGenome models demonstrated high concordance between predicted and observed read coverage on unseen genome intervals (Fig. 2a). As an example, predicted HepG2 genome tracks over the LDLR gene showcased strand-specific, base-pair-resolution RNA-seq coverage over exons, along with predicted splice sites, splice site usage and splice junction read coverage (Fig. 2b). More examples illustrating splicing, gene expression and chromatin track predictions are provided in Supplementary Figs. 1–3, and finer delineation of genomic features such as exon boundaries is highlighted in Supplementary Fig. 4.
Quantitatively, we observed strong Pearson correlations (r) between predicted and observed signals for functional genomics tracks in both human and mouse genomes (Fig. 2c), both across all tracks and when subsetting by biosample types or data sources (Supplementary Fig. 5). Although overall expression levels are predicted well, accurately capturing cell-type-specific expression deviations remains a challenging task (Fig. 2d and Supplementary Fig. 2j).
On splicing (Extended Data Fig. 2a), AlphaGenome accurately predicts splice sites (Extended Data Fig. 2b) and splice site usage (Extended Data Fig. 2b,c). It also accurately predicts quantitative splice junction read coverage and PSI5 and PSI3 within various tissues, achieving high correlation with experimental measurements (Fig. 2e, Extended Data Fig. 2b,d,e and Methods). Although AlphaGenome demonstrates the ability to predict tissue-specific alternative splicing in some instances (Supplementary Fig. 1), further improvements are needed to precisely predict intermediate splicing efficiencies and to capture tissue-specific nuances (Extended Data Fig. 2c,e).
Improved splicing variant predictions
One of the main ways genetic variants cause disease is by disrupting splicing15, a process that produces mature RNA sequences by excising introns and ligating exons at splice junctions. Splicing outcomes can be modelled at three levels: the probability that any given nucleotide acts as a splice donor or acceptor (splice site prediction)4,11,16, competitive selection among potential splice sites (splice site usage prediction)11,16 and prediction of specific introns (splice junction prediction). AlphaGenome predicts all three of these quantities alongside direct RNA-seq coverage prediction, thereby providing a more comprehensive view of the splicing-related molecular consequences of variants (Fig. 3a).
To illustrate the capacity of AlphaGenome to simultaneously predict several relevant splicing variant effects, we first probed its ability to recapitulate known biological outcomes. We interrogated a 4-bp deletion (chr. 3: 197081044: TACTC>T), a variant empirically observed to cause exon skipping in tibial artery tissue in a sample from the Genotype–Tissue Expression (GTEx)17 project (Fig. 3b). AlphaGenome accurately predicted this established consequence across all levels: a substantial reduction in the predicted usage of the affected exon’s splice site, loss of predicted junctions linking the skipped exon edges, emergence of a putative junction bypassing the exon and strong decrease in predicted RNA-seq coverage of the exon. Similarly, the predictions of AlphaGenome accurately captured the new splice junction and extended exon induced by the chr. 21: 46126238: G>C variant, an effect observed in a heterozygous GTEx RNA-seq sample (Fig. 3c). Finally, in silico mutagenesis (ISM), which systematically predicts effects of all possible single nucleotide variations in a sequence region (Methods), revealed the sequence determinants of the splicing predictions. For example, ISM analysis of exon 9 of the U2SURP gene and its flanking introns highlighted recognizable splicing-related sequence motifs18,19 (Fig. 3d). Further examples of experimentally validated splice-disrupting variants identified in individuals with autism spectrum disorder4 are shown in Supplementary Fig. 6.
Building on the multifaceted splicing predictions of AlphaGenome, we developed a unified splicing variant scorer to systematically detect splice-disrupting variants. Specifically, we designed a custom variant scoring strategy for each prediction modality (Fig. 3e and Methods) and summed the individual scores to provide a composite measure of a variant’s predicted effect. We benchmarked this composite scorer against existing methods on a wide range of splicing-related variant effect prediction tasks. AlphaGenome performed best on fine-ma