Main
Mammalian genomes are extensive repositories of DNA-encoded instructions that control cellular functions through complex regulatory mechanisms. Central to this regulation are cis-regulatory elements (CREs)—non-coding DNA sequences that control the transcription of nearby genes1. Usually associated with open chromatin and specific histone modifications, CREs contain binding sites for transcription factors and other chromatin-associated proteins that interact with one another and the transcriptional machinery to regulate gene expression. Understan…
Main
Mammalian genomes are extensive repositories of DNA-encoded instructions that control cellular functions through complex regulatory mechanisms. Central to this regulation are cis-regulatory elements (CREs)—non-coding DNA sequences that control the transcription of nearby genes1. Usually associated with open chromatin and specific histone modifications, CREs contain binding sites for transcription factors and other chromatin-associated proteins that interact with one another and the transcriptional machinery to regulate gene expression. Understanding the biological contexts and functions of CREs is essential for deciphering genome function and its impact on human health and disease8.
The ENCODE project has made major contributions towards our understanding of gene regulation by systematically identifying and annotating functional elements in human and mouse genomes2. Across tens of thousands of high-throughput functional genomics assays, the ENCODE consortium has comprehensively mapped biochemical signatures and used them to annotate CREs, including the ENCODE Phase III (ENCODE3) registry of cCREs9.
Here we have expanded the registry of cCREs to include 2.37 million human and 967,000 mouse elements. This expansion leverages new datasets produced during ENCODE Phase IV (ENCODE4) and improved computational methods, making the registry one of the most extensive repositories of CREs available. The updated registry spans hundreds of unique cell and tissue types, and enhances our understanding of gene regulation across a broad range of biological contexts.
In addition to expanding the number of annotated elements, the updated registry integrates functional characterization data for more than 97% of human cCREs, revealing how sequence features influence regulatory activity and uncover new functional subclasses of cCREs. For example, we identified thousands of silencer cCREs—many of which act as enhancers in alternative cellular contexts—and defined MAFF and MAFK-bound cCREs as dynamic enhancers poised for activation under stress-responsive conditions. Together, these analyses highlight the diversity and context-dependent versatility of CRE function.
Integrating the expanded registry with other ENCODE Encyclopedia annotations further enables the systematic interpretation of genetic variation and the identification of trait-associated genes. As an example, we used the registry to nominate KLF1 as a probable causal gene for red blood cell (RBC) traits. Collectively, this work establishes an expanded and functionally characterized registry that provides a comprehensive framework for studying CREs and their roles in gene regulation, cellular identity and disease.
Expansion of the ENCODE registry
The ENCODE4 registry encompasses 2,373,014 human cCREs and 926,843 mouse cCREs, covering 21% of the human and 9% of the mouse genomes (Fig. 1 and Supplementary Note 1). This is a threefold increase over ENCODE3 and establishes the ENCODE4 registry as one of the most extensive repositories of CREs currently available (Supplementary Note 1 and Supplementary Data 1 and 2).
Fig. 1: The updated registry of cCREs.
a,b, Schematic of the pipeline used to make version 4 of the registry of cCREs (a). We defined element anchors by generating rDHSs and representative transcription factor clusters (TF rClusters). Element anchors are scored and classified with H3K4me3, H3K27ac and CTCF ChIP–seq and ATAC-seq signals (yellow box) according to the scheme in b. This results in 2.37 million cCREs in the human genome and 967,000 in the mouse genome. We supplement the registry with additional ENCODE Encyclopedia annotations including transcription quantifications, 3D chromatin contacts, functional characterization measurements, sequence features and genetic variation (blue box). The registry of cCREs and all layered annotations are housed in our web portal SCREEN. New components of the pipeline are denoted by stars. b, Overview of our cCRE classification scheme. cCREs are classified on the basis of their patterns of biochemical signals (chromatin accessibility in green, H3K4me3 in red, H3K27ac in yellow, CTCF in blue and transcription factor in purple) and distance from annotated TSSs. High signals are denoted by peaks. ‘+/−’ indicates that the corresponding signal may or may not be present and its presence does not affect classification. New categories of elements are denoted by stars. c, Bar graphs depicting the number of cCREs annotated in each class for human and mouse.
This expansion reflects both incorporation of new ENCODE4 data and technical improvements to the cCRE pipeline (Supplementary Note 1 and Supplementary Table 1). We continued to anchor cCREs on representative DNase hypersensitivity sites (rDHSs) and annotated their functions using H3K4me3, H3K27ac and CTCF chromatin immunoprecipitation with sequencing (ChIP–seq) data. In this expansion, we also incorporated thousands of transcription factor ChIP–seq and assay for transposase-accessible chromatin using sequencing (ATAC-seq) datasets, enabling annotation of elements in low-accessibility regions and biosamples lacking DNase-seq (Supplementary Fig. 1 and Supplementary Tables 2 and 3). Finally, pipeline updates improved recovery of cCREs in previously difficult-to-map regions, including duplicated loci and repetitive elements such as Alu elements (Supplementary Note 1.2 and Supplementary Fig. 2).
The number of human chromatin accessibility and ChIP–seq experiments increased by 2.3-fold between ENCODE3 and ENCODE4. This doubled the number of biosamples—unique tissues, cell types and cellular states—represented in the registry from 839 to 1,679, spanning 42 human organs and tissues (Extended Data Fig. 1a). Most biosamples are primary cells and tissues, but they also include in vitro-differentiated cells, organoids and cell lines (Extended Data Fig. 1b), supporting both mechanistic studies and translational models. Coordinated ENCODE4 data production also expanded the core collection of biosamples—those with all four core assays DNase, H3K4me3, H3K27ac and CTCF—nearly sevenfold, from 25 in ENCODE3 to 170 in ENCODE4 (Extended Data Fig. 1c), enabling thorough annotation across cell types (Supplementary Note 1.3).
Classification of cCREs
We classified cCREs into eight categories based on distance to annotated transcription start sites (TSSs) and combinations of biochemical signals (Fig. 1b,c, Supplementary Note 1.4 and Supplementary Fig. 3). Five of these classes—promoter, proximal enhancer, distal enhancer, CA-H3K4me3 and CA-CTCF—were defined in our previous work9. Here we add three new categories: CA-TF cCREs, which have high accessibility but lack H3K4me3, H3K27ac or CTCF and are bound by a transcription factor; CA cCREs, which are accessible but lack enrichment for H3K4me3, H3K27ac and CTCF; and TF cCREs, which show little accessibility or histone marks yet are bound by a transcription factor. These new categories do not by themselves define discrete functional classes; however, subsequent analyses show that they include silencers and dynamic enhancers, extending the registry to capture a wider spectrum of cis-regulatory activity.
As in previous versions of the registry9, we performed cCRE classification across all biosamples (cell type-agnostic) and in specific biosamples (Supplementary Note 1.4). To facilitate comparisons at the organ and tissue level, we also generated aggregate annotations by combining biosamples from the same organ or tissue (Supplementary Note 1.5 and Supplementary Table 4a).
This expanded classification scheme and inclusion of additional biosamples added 1.4 million human and 587,000 mouse cCREs, which were enriched for evolutionary conservation and regulatory activity (Supplementary Notes 1.6 and 1.7, Supplementary Figs. 4 and 5 and Supplementary Tables 4b and 5a,b). To evaluate the effect of this expansion, we compared the ENCODE4 cCREs to four datasets that recently annotated CREs using independent approaches across diverse tissues (Supplementary Note 1.7). For all four datasets, the fraction of overlapping ENCODE4 cCREs increased relative to ENCODE3 cCREs, driven by improved brain, embryonic tissue and immune coverage (Supplementary Fig. 6 and Supplementary Table 5c–j). These comparisons demonstrate that the ENCODE4 registry achieves both broad coverage and cell-type-specific resolution, enabling more comprehensive representation of regulatory activity across the human genome.
To explore sequence features underlying cCRE classes, we trained cell type-specific variational autoencoders on cCRE sequences (Extended Data Fig. 2a and Supplementary Note 1.8). Promoter and distal enhancer cCREs segregated along the first dimension (Extended Data Fig. 2b and Supplementary Fig. 7), which strongly correlated with the percentage of guanine and cytosine (GC) nucleotides of the sequences (R = 0.92; Extended Data Fig. 2c). GC content also distinguished subsets of transcribed elements (Extended Data Fig. 2d) and correlated with transcription factor motif preferences (Supplementary Note 1.8, Supplementary Fig. 8 and Supplementary Table 6a). These results indicate that GC content is a key feature distinguishing cCRE classes and that our biochemical classification scheme effectively captures these sequence-driven differences.
While the ENCODE4 registry of cCREs provides extensive coverage across diverse tissues, its annotations are inherently influenced by the availability and quality of underlying genomic data (Supplementary Note 1.9, Supplementary Fig. 9 and Supplementary Table 6b,c). For example, the smaller number of mouse cCREs reflects the ENCODE emphasis on human samples and results in differences in class composition. Future work will expand both the human and mouse registries by incorporating publicly available datasets from additional tissues, developmental stages, and perturbation conditions.
Testing cCRE activity with functional assays
Having defined and annotated the putative functions of cCREs using chromatin signatures, we next evaluated their functional activities. ENCODE4 tested the activities of millions of genomic regions using four types of functional assays—genome-wide STARR-seq assays3, massively parallel reporter assays (MPRAs)4, CRISPR perturbation assays5,6 and transgenic mouse enhancer assays7. Nearly all of the human cCREs (97%) were tested by at least one assay in at least one cellular context, with 28% having significant activity in at least one assay (Table 1, Supplementary Note 2.1 and Supplementary Table 7). This overall rate is likely to underestimate the true fraction of functional cCREs, as most assays were performed in a limited number of cell types and are biased toward detecting enhancer activity; for example, in K562, an erythroleukemia cell line—which has the most extensive data available—91% of promoter cCREs and 65% of enhancer cCREs showed significant activity. Functional activity was strongly enriched in cCREs relative to non-cCRE regions, and among the tested cell types, cCREs bearing active chromatin signatures were more likely to show activity (Supplementary Fig. 10). This cell-type specificity was especially pronounced in CRISPR perturbation experiments, which capture activity in the native chromatin context (Supplementary Note 2.1).
Among the four assays, STARR-seq had the highest throughput, testing 2.2 million cCREs; we focused subsequent analyses on this assay. Because STARR fragments could contain multiple cCREs, we developed a novel method CRE-centric analysis and prediction of reporter assays (CAPRA) to calculate cCRE-specific STARR scores from RNA:DNA ratios, resolving enhancer and silencer activities (Supplementary Note 2.2 and Supplementary Fig. 11). CAPRA characterized 75–87% cCREs in each experiment and supported downstream analyses such as identifying sequence features, comparing cCRE activity across cell types and quantifying combinatorial cCRE interactions (Supplementary Notes 2.3–2.5, Supplementary Fig. 12, Supplementary Tables 8 and 9 and Supplementary Data 3 and 4).
We first studied cell-type-specific functional activity of cCREs by comparing their STARR scores in K562 versus HepG2 (a hepatocellular carcinoma cell line). Promoter cCREs were more likely to have consistent STARR scores across cell types than distal enhancer cCREs (Supplementary Fig. 12a), concordant with our previous results9 and the general understanding that enhancers tend to be more cell-type specific than promoters10.
To determine what sequence features were responsible for this cell-type-specific activity, we investigated which transcription factor motifs were enriched in distal enhancer cCREs with differentially high STARR scores in K562 (K562 STARR+) versus HepG2 (HepG2 STARR+) cells (Fig. 2a). Both groups of cCREs were enriched for cell type-relevant transcription factor motifs—haematopoietic factors in K562 and hepatocyte nuclear factors in HepG2—and were more likely to exhibit active biochemical signatures in the respective cell type (Fig. 2b, Supplementary Fig. 13 and Supplementary Table 9a,b). When we analysed STARR scores in two other cell lines, HCT116 (colon carcinoma) and MCF-7 (breast cancer), the STARR+ enhancer cCREs enriched for the respective haematopoietic (GATA1) or hepatocyte (HNF4A) transcription factor motifs did not show STARR activity (Extended Data Fig. 3a), consistent with the cell type-specific expression of these transcription factors across the four lines (Extended Data Fig. 3b).
Fig. 2: Functional characterization of the registry of cCREs.
a, CAPRA quantifications for distal enhancer cCREs in HepG2 versus K562 cell lines. The colour of points indicates cCREs with enriched activity (STARR+) in HepG2 (green,* n* = 5,246) or K562 (pink,* n* = 1,892) cells. b, Bar plots of motif enrichment for HepG2 (green) or K562 (pink) STARR+ distal enhancers (as defined in a). Top five motifs are shown for each group of cCREs along with their corresponding logo. c, Genome browser view of three distal enhancer cCREs (denoted 1–3) in the MTNR1A intron with DNase (green) and H3K27ac (yellow) signals in K562 cells. A STARR-seq peak is shown in black. d, CAPRA quantifications for the three enhancers shown in c: EH38E3620077 (1), EH38E3620078 (2) and EH38E3620079 (3) using solo fragments (top) and double fragments (bottom). High values are denoted in purple (CAPRA, P = 0.03). e, Top, overlap of common K562 transcription factor motifs at the three enhancers in c,d. Bottom, representative motif logos for EH38E3620078 and EH38E3620079 are shown.
We observed unexpected enrichment for two motifs in HepG2 STARR+ enhancers: p53 and GFI1B. HepG2 STARR+ enhancers with p53 motif sites had high activity in HepG2 cells but low activity in K562 cells; most of these enhancers were also active in HCT116 and MCF-7 (Extended Data Fig. 3a). These results are consistent with the status of p53, a tumour suppressor, in these cell lines: inactive in K562 and active in the other three lines (Supplementary Note 2.6). Our results underscore the importance of biosample selection, as the disruption of regulatory mechanisms in a cancer cell line can affect the interpretation of data obtained using that line. Meanwhile, HepG2 STARR+ enhancers with GFI1B motif sites had moderate-to-high STARR scores in HCT116 and MCF-7 cells but lower-than-baseline activity in K562 cells (Extended Data Fig. 3a). GFI1B is a transcriptional repressor that is expressed in erythrocyte progenitors11 and K562 cells (Extended Data Fig. 3b). These results suggest that GFI1B-containing cCREs repress transcription in K562 cells and that our method can identify elements with repressive or silencing activities.
In addition to characterizing the activity of individual cCREs, the CAPRA method quantified the activity of 335,909 pairs of cCREs (Supplementary Data 4). Combined activity levels were generally correlated with the averaged activity of the individual cCREs, and this correlation increased as more stringent filters were applied to the data (Supplementary Fig. 14a–d). Nevertheless, there were notable exceptions, including cCRE pairs with lower or higher than expected effects, suggesting repressive and cooperative interactions, respectively (Supplementary Note 2.5 and Supplementary Fig. 14e,f). For example, three enhancer cCREs within an intron of MTNR1A showed low-to-moderate activity when tested separately in K562 cells (STARR scores of 0.01, 0.39 and 0.34 for EH38E3620077, EH38E3620078 and EH38E3620079, respectively; Fig. 2c,d). When assayed together, the first two maintained moderate activity (0.38), whereas the last two displayed unexpectedly strong cooperativity (1.62, P = 0.03). EH38E3620078 and EH38E3620079 contained more transcription factor motif sites than the weaker partner EH38E3620077 (46–49 versus 27), consistent with their higher individual activities. In combination, their cooperative effect exceeded expectations, which we hypothesize is due to the diversity of their motif profiles—AT-rich (STAT, FOX and BACH) versus GC-rich (KLF, SP)—a pattern that is also supported by ChIP–seq (Fig. 2e and Supplementary Table 9f,g). Although our power to detect such events is currently limited by the STARR-seq library construction, in future we will be able to design experiments to include a wider range of fragment lengths to further study combinatorial effects on more cCREs.
Identifying REST-bound silencers
Silencers are CREs that repress transcription12. Several studies have identified putative human silencers using chromatin and functional data13,14,15,16; however, the overlap among these datasets is limited, probably owing to methodological differences (Supplementary Table 10a,b). To systematically identify silencers within a unified framework, we started with a well-characterized subclass of silencers: neuron-restrictive silencer elements (NRSEs). Bound by the transcriptional repressor REST, NRSEs repress neuronal gene expression in non-neuronal cells17. By characterizing the biochemical and functional activity profiles of candidate NRSEs, we derived generalizable principles to guide the broader identification of silencer elements.
Using the 29 ENCODE REST ChIP–seq experiments conducted across biosamples, we defined candidate NRSEs—hereafter referred to as REST+ cCREs—by selecting all non-promoter cCREs that contained a REST motif site and overlapped the summits of at least five ChIP–seq peaks (Fig. 3a, Supplementary Note 3, Supplementary Fig. 15a,b and Supplementary Table 10c,d). We then evaluated these cCREs using transgenic mouse enhancer assays (Supplementary Table 10e,f), CAPRA STARR scores, histone ChIP–seq (Supplementary Table 10g–i), and gene expression. Stratifying analyses by cCRE class, we identified two major categories of REST+ cCREs (Fig. 3b and Supplementary Note 3): (1) REST+ enhancer/silencer cCREs (n = 2,534), which function as enhancers in neurons (where REST is not expressed) but act as silencers in non-neuronal cells (where REST is expressed); and (2) REST+ silencer cCREs (n = 2,253), which lack enhancer activity and function exclusively as silencers outside the neuronal context.
Fig. 3: Identification of distinct functional categories of REST-bound cCREs.
a, Computational pipeline for identifying REST+ cCREs. We overlapped cCREs with REST ChIP–seq peaks and selected all cCREs that overlap at least five peak summits and an annotated REST motif site. b, Schematic for characterizing REST+ cCREs into two categories: REST+ enhancer/silencer cCREs (n = 2,534) and REST+ silencer cCREs (n = 2,253). c, Representative result from the mouse transgenic enhancer assay showing activity of REST+ enhancer cCRE EH38E1910506 in mouse brain regions (hindbrain activity in 4 out of 6 embryos; midbrain activity in 6 out of 6 embryos). d, The percentage of cCREs tested in transgenic mouse enhancer assays with positive activity. cCREs are stratified into four groups on the basis of REST+ category (enhancer/silencer (yellow):* n* = 22; silencer (purple):* n* = 15) and REST binding (light bars are REST+; light bars are REST−). Statistical significance is calculated by a two-sided Fisher’s exact test between REST+ categories (enhancer/silencer: n = 22; silencer: n = 15) and class-matched controls (enhancer/silencer: n = 5,428; silencer: n = 448). REST+ silencer cCREs are depleted for activity compared to REST+ enhancer/silencers (P = 0.001) and matched control cCREs (P = 0.02). e, Bar graphs showing the percentage of REST+ enhancer/silencer cCREs with transgenic mouse enhancer assay activity in specific tissues versus other tested enhancers (as defined in d). Statistical significance is calculated by a two-sided Fisher’s exact test. REST+ enhancer/silencer cCREs are enriched for activity in hindbrain (P = 0.02) and midbrain (P = 0.05). f, Density plot of the distributions of CAPRA STARR scores for all cCREs, REST+ enhancer/silencer cCREs and REST+ silencer cCREs. Both groups of REST+ cCREs have median STARR scores less than zero, suggesting silencer activity for both groups. Statistical significance is calculated using a two-sided Wilcoxon test comparing each REST+ category to the background set of all cCREs.
REST+ enhancer/silencer cCREs exhibit context-dependent regulatory activity—enhancers in neurons and silencers in non-neuronal cells. In transgenic mouse enhancer assays, these cCREs displayed comparable enhancer activity to other enhancer cCREs lacking REST binding (REST–), with 59% and 61% validation rates, respectively (Fig. 3c,d). However, REST+ enhancer/silencer cCREs were preferentially active in hindbrain and midbrain tissues (2.3 and 2.1 fold enrichments, two-sided Fisher’s exact test, P = 0.01 and P = 0.04, respectively; Fig. 3e, Supplementary Fig. 15c and Supplementary Table 10f), consistent with enrichment for H3K27ac in neuron-related biosamples (Supplementary Note 3 and Supplementary Table 10g,h). In K562 cells, REST+ enhancer/silencer cCREs showed reduced STARR-seq activity (median of –0.10 versus –0.02 for all cCREs, two-sided Wilcoxon test, P < 2.2 × 10−16; Fig. 3f and Supplementary Fig. 15d), suggesting silencer activity. Genes near these cCREs also exhibited reduced expression relative to genes near cCREs lacking open chromatin (median transcripts per million (TPM) 0.7 versus 2.4, two-sided Wilcoxon test, P = 3.0 × 10−6; Supplementary Fig. 15e). Together, these results support a dual-function model in which REST+ cCREs act as neuronal enhancers in the absence of REST binding and as silencers in REST-expressing cell types. An example REST+ enhancer/silencer is shown in Extended Data Fig. 4a–c.
REST+ silencer cCREs lack enhancer activity and function only as silencers in REST-expressing cell types. In transgenic mouse enhancer assays, only one REST+ silencer cCRE was active, a validation rate that was significantly lower than that of REST+ enhancer/silencer cCREs (8% versus 61%, P = 0.04) and class-matched REST– cCREs (40%, two-sided Fisher’s exact test, P = 0.03; Fig. 3d, Supplementary Fig. 15c and Supplementary Note 3). In K562 cells, these elements exhibited strongly negative STARR scores (median = –0.40), significantly lower than those of REST+ enhancer/silencer cCREs (two-sided Wilcoxon test, P < 2.2 × 10−16; Fig. 3f and Supplementary Fig. 15d). Nearby genes also showed minimal expression (median TPM = 0.1, two-sided Wilcoxon test, P < 2.2 × 10−16; Supplementary Fig. 15e), suggesting repressive activity in native chromatin context. An example REST+ silencer is shown in Extended Data Fig. 4d–f.
We explored the features that distinguished the two categories of REST+ cCREs (Supplementary Note 3). Although we did not detect differences in gene ontology enrichment—both classes were near genes involved in neuronal and synaptic processes (Supplementary Table 10j)—we observed clear distinctions in sequence composition. Compared with REST+ silencer cCREs, REST+ enhancer/silencer cCREs were modestly enriched for enhancer-associated transcription factor motifs (Supplementary Table 10k) and were more evolutionarily conserved in flanking regions (Extended Data Fig. 5a–e). Previous work showed that REST-associated CREs are enriched within genomic repeats18, and our analysis reveals that this enrichment varied between REST+ subclasses. REST+ enhancer/silencer cCREs were depleted for long interspersed nuclear elements (LINEs), particularly for L1, which are known to be among the most active and evolutionarily dynamic transposable elements in the human genome19 (3% versus 11%, two-sided Fisher’s exact test, P < 2.2 × 10−16; Supplementary Table 10l). The lower L1 content and greater conservation of REST+ enhancer/silencers suggest that these elements are evolutionarily older, whereas the ‘younger’ REST+ silencers may have originated from L1-associated insertions and retained only the silencing component of regulatory activity.
Expanded annotation of silencers
Our analysis of REST+ cCREs showed that silencer activity can be identified by negative STARR scores. Therefore, we utilized the genome-wide STARR-seq data to identify silencers beyond NRSEs. Using CAPRA, we identified 545 stringent (P < 0.01) and 5,468 robust (P < 0.05) STARR silencer cCREs in K562 cells, 9% and 5% of which were also REST+ cCREs, respectively (Extended Data Fig. 5f and Supplementary Table 11a). STARR silencer cCREs showed consistent negative STARR scores across datasets from multiple laboratories and cell types, with the strongest, most reproducible repression observed for the stringent calls (Supplementary Note 4.1 and Supplementary Fig. 16a). STARR silencer cCREs were enriched for non-promoter and non-enhancer cCREs (Supplementary Fig. 16b,c), highlighting the importance of including CA-TF and TF classes in our expanded cCRE classification scheme.
Similar to genes near REST+ cCREs, genes near STARR silencer cCREs had lower than expected expression in K562 cells (median TPMs of 0.5 and 1.4 for stringent and robust STARR silencer cCREs, respectively; pairwise, two-sided, Wilcoxon test with false-discovery rate (FDR) correction, P < 6.6 × 10−4; Supplementary Fig. 16d). These genes were enriched for nervous system and renal development (Extended Data Fig. 5g and Supplementary Table 11b), suggesting that silencers help repress tissue-specific programmes in non-expressing contexts.
STARR silencers also had distinct sequence features and evolutionary conservation patterns. They were enriched for motifs for the repressor GFI1B (>2.6-fold enrichment, two-sided Fisher’s exact test with FDR correction, P < 2.2 × 10−16; Supplementary Table 11c) and in native chromatin context, overlapped ChIP–seq peaks of several transcription factors and chromatin remodellers (Supplementary Table 11d), which further defined subclasses (Supplementary Fig. 16e). Despite these associations, STARR silencer cCREs were not enriched for any particular repressive chromatin states in K562 cells, but were consistently depleted for active histone marks (Supplementary Table 11e). STARR silencer cCREs were also more conserved than non-cCRE regions (group 1 (G1): 27–30% versus 19%, indicating mammalian conservation20, two-sided chi-square tests, P < 2.7 × 10−11) but less than REST+ silencers (G1: 38%, two-sided chi-square tests, P < 1.3 × 10−14; Supplementary Table 11f). They were also more likely to overlap LINEs (29%) than enhancer (16%) and low chromatin accessibility (21%) cCREs in K562 cells (two-sided Fisher’s exact tests with FDR correction, P < 4.6 × 10−6; Supplementary Table 11g).
To assess silencer activity in native chromatin context, we integrated our silencer cCREs with CRISPR interference (CRISPRi)-Flow-fluorescence in situ hybridization (FISH) data in K562 cells. Of the 734 CRISPRi-perturbed cCREs, 2 were silencers, including EH38E4193243, which is both a STARR silencer and a REST+ enhancer/silencer (Supplementary Note 4.2 and Supplementary Table 11h). This cCRE is an enhancer of RTBDN—which encodes a riboflavin-binding protein critical for photoreceptor functions—in retinal cells but silences RTBDN in K562 cells, where it is bound by REST. CRISPRi of this cCRE in K562 cells also increased expression of the upstream gene PRDX2, mediated by long-range chromatin interactions (Supplementary Fig. 17 and Supplementary Note 4.2). These results suggest that silencers can affect the expression of multiple genes, including distal targets.
Silencers comprise distinct subclasses
In total, we annotated 9,972 silencer cCREs—comprising both REST+ cCREs (n = 4,787) and STARR silencer cCREs (n = 5,468). Of these, 9,182 are predicted to be active in K562 cells (that is, showing REST binding or having significantly negative STARR score in K562 cells). For comparison, in K562 cells we annotate 20,041 promoters and 35,488 distal enhancers. Thus, silencers make up a significant proportion of regulatory activity in this well-characterized cell line. Expanding functional assays across additional biosamples and assay sources is likely to reveal more silencers and improve coverage across diverse cellular contexts.
Our silencer cCREs only overlap a subset of previously annotated silencers13,14,15,16 (Supplementary Fig. 18a). The highest concordance was with silencers identified by Jayavelu et al.14, who also used STARR-seq (Supplementary Note 4.3 and Supplementary Table 10a). For regions identified by both studies, 93% of our silencer cCREs were classified as silencers by Jayavelu et al. (2.3-fold enrichment, two-sided Fisher’s exact test, P < 2.2 × 10−16; Supplementary Fig. 18b). However, Jayavelu et al. tested only 3% of our silencer cCREs, suggesting that we identified many additional silencers (Supplementary Fig. 18c).
A major distinction of our silencer annotation from earlier studies lies in our cCRE registry-based approach. Previous annotations typically centred on open chromatin regions within the cell type, where silencer activity was assayed or in a limited number of biosamples. Only 14% of our silencer cCREs show high chromatin accessibility in K562 cells (Supplementary Note 4.4 and Supplementary Fig. 18d), consistent with recent findings from Drosophila silencer-sequencing screens showing that most silencers do not overlap DNase-seq or ATAC-seq peaks21. Instead, many of our silencer cCREs exhibit open chromatin in early-stage cell types and embryonic tissues (Extended Data Fig. 6a–c, Supplementary Note 4.4 and Supplementary Table 12a,b). Accessibility in embryonic tissue or progenitor cell types does not necessarily indicate that these cCREs function as silencers during development. Rather, we interpret this pattern as evidence of transient accessibility during development, potentially enabling transcriptional repressors such as REST to bind, close the chromatin, and thus establish long-term silencing programmes in differentiated cell types. Therefore, using a comprehensive collection of biosamples to define the registry of cCREs, we are able to identify silencers that would otherwise be missed.
Further distinguishing our approach, most REST+ and STARR silencer cCREs lack consistent enrichment for repressive chromatin signatures, such as those marked by polycomb or bivalent histone modifications. This is in contrast to previous silencer annotations based on H3K27me3 enrichment, suggesting that our silencer cCREs represent distinct subclasses of silencers with different epigenetic properties (Supplementary Note 4.5, Supplementary Fig. 18e and Supplementary Table 12c). This is further supported by differences in overlap with repetitive elements and evolutionary conservation. As described above, REST+ and STARR silencer cCREs are enriched for LINEs, particularly L1 elements (Extended Data Fig. 6d, Supplementary Note 4.6 and Supplementary Table 12d). By contrast, other published silencers are enriched for different repeat classes—short interspersed repeats (SINEs) in Pang & Snyder15 and long terminal repeats (LTRs) in Jayavelu et al.14 (Extended Data Fig. 6d)—which also corresponded to differences in conservation profiles (Supplementary Fig. 18f–i and Supplementary Table 12e,f). These patterns suggest that the different approaches capture functionally and evolutionarily distinct subsets of silencers, supporting the view that silencers represent a heterogeneous class of regulatory elements with diverse mechanisms of action12. Further studies are needed to more comprehensively define and characterize silencers, including those that may be missed by the current registry of cCREs (Supplementary Note 4.7 and Supplementary Table 12g,h)
MAFF and MAFK mark dynamic enhancers
Our analysis revealed that silencer cCREs were enriched within the newly defined CA-TF and TF cCRE classes (Supplementary Figs. 15b and 16c). However, silencers accounted for only a subset of these groups, prompting further investigation into the broader regulatory potential of CA-TF and TF cCREs. One subgroup of particular interest was the MAFF and MAFK-binding TF cCREs (MAFF/MAFK+ cCREs), which inspired the formal inclusion of TF cCREs into the registry, as 27% of MAFF and MAFK ChIP–seq peaks do not overlap rDHSs (Supplementary Note [1.1](https://