Main
Our understanding of human metabolism is mostly based on dedicated hypothesis testing in experimental settings, informed by model organisms or observations in patients with rare diseases. Only recently has high-throughput profiling of small molecules in large-scale studies enabled systematic testing of genetic variation across the genome and provided an agnostic approach for discovering genes that encode key metabolic regulators1,2,[3](#ref-CR3 “Karjalainen, M. K. et …
Main
Our understanding of human metabolism is mostly based on dedicated hypothesis testing in experimental settings, informed by model organisms or observations in patients with rare diseases. Only recently has high-throughput profiling of small molecules in large-scale studies enabled systematic testing of genetic variation across the genome and provided an agnostic approach for discovering genes that encode key metabolic regulators1,2,3,4,5,6,7,8,[9](#ref-CR9 “Tambets, R. et al. Genome-wide association study for circulating metabolites in 619,372 individuals. Preprint at medRxiv https://doi.org/10.1101/2024.10.15.24315557
(2024).“),10,11. These efforts have provided important new insights into how genetic variation shapes human chemical and metabolic individuality1 and have corroborated a large body of biochemical knowledge1,2,10,12.
The importance of such genome–metabolome-wide association studies (mGWAS) extends beyond the mapping of biochemical pathways, sometimes demonstrating almost immediate clinical value. They provided examples of how readily available supplementation strategies may prevent disease or delay onset in high-risk individuals, such as serine for macular telangiectasia type 2, a rare eye disorder2. They further identified unknown variants that affect the absorption, distribution, metabolism and excretion of exogenous compounds, most importantly drugs1,13, thereby providing pathways to mitigate adverse drug effects. However, there are several challenges that currently limit the potential of mGWAS analyses, particularly for causal inference. These include (1) the still rather small number of, at most, a dozen genetic variants linked to single molecules, (2) the inability to distinguish whether pleiotropic variants act on different molecules or pathways independently (horizontal pleiotropy), or whether they serve as ‘root causes’ of successive downstream changes (vertical pleiotropy), (3) the difficulty in distinguishing between locus-specific and metabolite abundance effects when colocalization at disease-risk loci is observed1 and (4) the challenge of confidently assigning effector genes at newly identified loci.
Here, we integrated rare (based on whole exome sequencing) and common genetic variation with measures of 249 metabolic phenotypes, including small molecules and detailed lipoprotein characteristics, among >450,000 UK Biobank (UKB) participants representing three distinct ancestries. We demonstrate largely consistent genetic regulation across ancestries and sexes for almost 30,000 locus–metabolite associations and systematically categorize abundant genetic pleiotropy. By integrating machine-learning-derived effector gene assignments with rare exonic variation, we identify previously unknown regulators of metabolism and observe heterogeneity in association profiles for variants mapping to the same gene. Finally, we demonstrate how systematic integration of statistical colocalization and Mendelian randomization can identify pathways with the potential to mitigate cardiovascular disease (CVD) risk beyond current approaches focused primarily on lowering low-density lipoprotein (LDL) cholesterol.
Results
We integrated genome-wide association studies (GWAS; population-specific minor allele frequency (MAF) ≥0.5%) with rare exome-wide association studies (ExWAS; MAF ≤0.05%) on plasma concentrations of 249 metabolite phenotypes, quantified using 1H nuclear magnetic resonance (NMR) spectroscopy. We included up to 450,000 UKB participants across three major ancestries (British White European, EUR (n = 434,646); British African, BA (n = 6,573); British Central/South Asian, BSA (n = 8,796)) (Extended Data Fig. 1). The NMR measures comprised 14 lipoprotein subclasses and associated characteristics (that is, extra-large very-low-density lipoprotein (VLDL) to small high-density lipoprotein (HDL) particles), along with small molecules such as amino acids and ketone bodies quantified in molar concentration units (Supplementary Table 1).
Common genetic variation underlying circulating metabolites
We identified 29,824 regional sentinel–NMR measure associations in trans-ancestral meta-analyses, representing 753 nonoverlapping genomic regions (Fig. 1a and Supplementary Table 2). Nearly half of these regions (n = 359, 47%) associated with more than ten NMR measures, demonstrating considerable pleiotropy. Characteristics of large HDL particles, such as particle size and lipid composition, were associated with the largest number of regions (median 166, interquartile range 126–195), compared with all NMR measures (median 105, interquartile range 68–142), findings that considerably extended previous work3 and replicated parallel efforts using UKB[9](https://www.nature.com/articles/s41588-025-02355-3#ref-CR9 “Tambets, R. et al. Genome-wide association study for circulating metabolites in 619,372 individuals. Preprint at medRxiv https://doi.org/10.1101/2024.10.15.24315557
(2024).“) (Extended Data Fig. 2). Genes with well-characterized roles in human metabolism were significantly enriched across different significance bins (adjusted P values <4.24 × 10−9; Supplementary Fig. 1), suggesting that ever-larger studies of omnigenic traits, such as metabolites, still yield biological plausible findings.
Fig. 1: Common genetic regulation of circulating metabolites.
a, A top-down Manhattan plot showing trans-ancestral sentinel variants for 249 metabolic phenotypes at a metabolome-adjusted genome-wide significance threshold of P < 2.0 × 10−10. Each row represents an NMR measure, colored for biochemical class. Chromosomal positions are shown on the x axis. P values are raw −log10(P value) from a two-sided Z test across effect estimates derived within three ancestral groups. b, Weighted average allele frequency compared with estimated effect size for trans-ancestral sentinel variants. Points are colored for biochemical classification. c, A comparison of effect sizes between British White European samples (x axis) and British African samples (y axis). We considered variants that were significant in either population. d, Similar to c but comparing British Central/South Asian samples. Dots are colored according to their absolute Z score in British White European samples.
We observed significant evidence of heterogeneity (P < 1 × 10−4) across ancestries for very few loci (n = 342; 1.14%), and ancestral-wise comparison of effect estimates demonstrated largely concordant effect estimates (Fig. 1c,d, Extended Data Fig. 3 and Supplementary Table 3). All sentinels seen in individuals of British African and British Central/South Asian ancestry were replicated in individuals of European ancestry, except for one locus that was specific to British Africans. The previously reported14 missense variant rs3211938 within CD36, which is common among individuals of African ancestry (MAFBA = 0.12) but absent among individuals of European ancestry (MAFEUR = 0.0), was significantly associated (P values <1.49 × 10−10) with lower plasma concentrations of omega 3 fatty acids and 15 other NMR measures, including lipoprotein particle characteristics. This is in line with the role of CD36 encoding for a fatty acid translocase, facilitating the recognition and uptake of long-chain fatty acids. We note that the sample sizes in the smaller ancestral groups did not permit comprehensive replication.
Sex-differential effects at loci encoding metabolic genes
While we observed highly correlated effect sizes across female and male participants (median r = 0.98, range 0.90–0.99), we also identified 360 putative sex-differential loci for 239 NMR measures, representing 1,800 heterogenous associations in sex-stratified meta-analyses (heterogeneity P value <5 × 10−8), most of which (65.3%; n = 1,175 loci) could not be explained by confounding factors (Supplementary Note, Supplementary Fig. 2 and Supplementary Table 4). Putative sex-differential loci were generally directionally concordant between the sexes (Fig. 2a), in line with previous proteomics analyses and suggesting that significant sex interactions do not reflect sex-discordant effects15.
Fig. 2: Putative sex-differential loci and reclassification of established lipid loci.
a, Comparison of effect sizes of putatively sex-differential loci (defined as loci with heterogeneity P < 5 × 10−8 in a two-sided Z-score meta-analysis across the sexes). b, Rank distributions for each of the five matching NMR traits compared with the Lipids Genetics traits across genetic loci. Per locus–trait combination, 205 lipid-related NMR traits were ranked based on their absolute effect size and compared with the NMR trait that corresponds the Lipids Genetics consortium trait. Pie charts show the percentage of loci where the corresponding NMR trait is ranked among the top 10% of associated traits. TC, total cholesterol; TG, triglycerides.
Refinement of regional associations through multi-ancestry fine-mapping
We next used a two-stage strategy to refine regional associations to a smaller number of candidate causal variants. We first identified 3,007 statistically independent metabolite quantitative trait loci (mQTLs) associated with one or more NMR measure, representing a total of 43,322 credible set–NMR measurement pairs (Supplementary Table 5). Lead fine-mapped mQTLs per NMR trait explained on average 6.9% (range 0.57–13.42%) of variance in plasma metabolite concentrations (Extended Data Fig. 4). Second, we leveraged the different linkage disequilibrium (LD) structure in British African and British Central/South Asian individuals to further refine 3,386 credible sets that contained >1 variant and with suggestive evidence in either ancestry, leading to an increase in the number of credible sets with high-confidence variants and decrease in mean credible set size from 9 to 4 variants (Supplementary Note and Supplementary Fig. 3). Trans-ancestral fine-mapping improved resolution in loci that did not resolve in individuals of European ancestry alone, but we note that the overall improvement was marginal. Instead of refining already tight credible sets, future studies should therefore focus on scaling discovery in non-European ancestries to identify unknown causal variants.
Biological reclassification of established ‘lipid’ loci
To assess the value of metabogenomic studies of 1H NMR-spectrometry-based lipoprotein profiling over standard clinical markers, we classified NMR metabolome association profiles for 1,657 genetic variants reported for commonly measured clinical markers (LDL cholesterol, HDL cholesterol, total cholesterol and triglycerides) obtained in 1.6 million people16. Around 25% of associated variants had the corresponding NMR measure among the top 10% of the most strongly associated NMR measures, with 22.5% of genetic variants showing significantly stronger association with refined lipoprotein measures compared with their matching measure on the NMR platform, an observation most pronounced for non-HDL and LDL cholesterol concentrations (Fig. 2b). Relevant loci for lipoprotein metabolism can thus be discovered using readily available clinical measurements; however, refined lipoprotein profiles are necessary for better understanding the relevant biological pathways, including any inference about druggability or use for genetic causal inference methods. One such example was the PNPLA3 locus (tagged by rs3747207, associated with LDL cholesterol by the Global Lipids Genetics Consortium; β = −0.014, P = 2.3 × 10−21), where we observed no association with LDL cholesterol (β = −0.001, P = 0.49) but with LDL particle size (β = 0.045, P = 1.04 × 10−73), and multiple characteristics of extra-large VLDL particles (Extended Data Fig. 5). The intronic rs3747207 variant is in strong LD (r2 = 0.98) with the well-known missense variant rs738409 (p.Ile148Met) that has been demonstrated to confer hepatic lipid accumulation by altering ubiquitination of patatin-like phospholipase domain-containing protein 3 (PNPLA3)17. Our results provide human genetic support for a recently proposed role of PNPLA3 in the secretion of large VLDL particles18.
Machine-learning-guided effector gene assignment
We successfully assigned effector genes for almost three-quarters of European ancestry fine-mapped mQTLs (73.6%; n = 2,213) with at least moderate confidence (candidate gene score ≥1.5, range 0–3), including about 28.2% with high-confidence assignments (score ≥2; n = 848), by training a machine learning model that integrates functional genomic resources with pathway information inspired by the ProGeM framework19 (Supplementary Table 6). For example, we prioritized the fatty acid elongase gene ELOVL6 for 16 different VLDL/HDL characteristics (tagged by rs3813829). The gene product, ELOVL fatty acid elongase 6, catalyzes the rate-limiting step in long-chain fatty acid elongation, which are subsequently incorporated into lipoprotein particles. We also prioritized genes with upstream roles in metabolism, including a locus on 17q25.3 where we prioritized cytohesin-1 (CYTH1) as the putative effector gene for 5 independent genetic variants linked to 11 distinct NMR measures mostly comprising characteristics of VLDL particles. CYTH1, previously associated with type 2 diabetes20, promotes activation of ADP-ribosylation factors (ARF)1, ARF5 and ARF6, regulators of lipid vesicle transport, membrane lipid composition and modification21, demonstrating a relevant but indirect link to lipoprotein metabolism.
We observed considerable overlap of machine-learning-guided effector gene predictions (top three genes) with those reported based on manually curated biological plausibility (191 out of 283 loci)3 or based on colocalization with protein quantitative trait loci (pQTLs) that have not been used to train the algorithm22 (81 out of 143; Supplementary Table 6). While missing overlap indicates room for improvement, 24 high-confidence assignments strongly disagreed with either external source (gene score > 2 but no match among pQTLs prioritized or manually curated ones). For example, we prioritized PEPD (score 2.42) as opposed to CEBPA3 for rs62102718. PEPD encodes peptidase D, which has been shown to promote adipose tissue fibrosis in mouse knock-out models promoting insulin resistance23. Insulin resistance, in turn, provides a very plausible explanation for the pleiotropic effect of the variant on diverse lipoprotein characteristics (n = 31).
Tissue distribution of effector genes
Assigned effector genes were significantly enriched in different tissues, reflecting known and lesser-established organ contributions (Extended Data Fig. 6a and Supplementary Table 7). Genes characteristic of the liver, adipose tissue, adrenal gland and female breast tissue (probably reflecting its high adipose tissue content) were significantly enriched among effector gene sets across the metabolic measures captured by NMR. This included significant enrichment of all amino acids in liver tissue (for example, phenylalanine: odds ratio (OR) 14.8, P < 1.3 × 10−8, histidine: OR 7.9, P < 2.9 × 10−11) but also for skeletal muscle in alanine metabolism (OR 3.82; P < 7.9 × 10−9). Similar enrichments were observed when using the closest gene instead of our annotated effector genes for mQTLs (Extended Data Fig. 6b).
Metabolic versus systemic pleiotropy
Pleiotropy is widespread but poorly understood. We developed a framework to characterize four different modes of metabolic pleiotropy (Fig. 3a–d, Extended Data Fig. 7, Supplementary Table 6 and Methods). About half of the pleiotropic mQTLs (n = 880; ≥2 NMR measures) showed evidence for two different modes of vertical pleiotropy. First, within confined pathways (n = 218; ‘pathway pleiotropy’; Fig. 3a) or, second, as a function of the correlation with the ‘lead’ NMR measure (n = 662; ‘proportional pleiotropy’; Fig. 3b). A prototypical example for proportional pleiotropy was an mQTL tagged by rs624698 for which we prioritized ANGPTL3 as the likely effector gene (Fig. 3b). Angiopoietin-like 3, encoded by ANGPTL3, inhibits lipoprotein lipase activity but also endothelial lipase, resulting in increased triglycerides, HDL cholesterol and phospholipid concentrations, consistent with HDL-particle characteristics being the most strongly associated NMR measure (P < 1.0 × 10−546). Other associations reflected downstream effects on lipoprotein metabolism rather than acting on independent pathways (Fig. 3b), considerably expanding previous genetic observations24.
Fig. 3: Modes of pleiotropy.
a–d, Representative scatterplots opposing the squared trait correlation of the lead NMR measure for the listed variant against the absolute Z score from linear regression models for all associated NMR measures. The colors indicate different modes of pleiotropy and correspond to the legend in e. For each plot, a linear regression fit (lines) with 95% confidence interval (bands) is given. Scatterplots in a–d represent examples of mQTLs classified as pathway pleiotropy (a), proportional pleiotropy (b), disproportional pleiotropy (c) and nonspecific pleiotropy (d). e, The number of associated NMR measures for each of 3,007 mQTL groups opposed to associations reported in the GWAS Catalog after pruning the GWAS Catalog for metabolic phenotypes (Methods). Coloring is according to modes of pleiotropy. f, A scatterplot opposing the number of associated NMR measures (x axis) of each mQTL group with the number of reported EFO parent categories in the GWAS Catalog. g, ORs (rectangle) and 95% confidence intervals (CIs; lines) from logistic regression models testing whether EFO categories (x axis) are more frequently reported for pleiotropic mQTL groups compared with specific ones. Darker colors indicated estimates passing corrected statistical significance. n = 3,007 mQTL groups have been used for enrichment testing.
The remaining half of pleiotropic mQTLs showed evidence for two modes of horizontal pleiotropy: those with evidence for ‘disproportional pleiotropy’ (n = 68) and a larger group with evidence for ‘nonspecific pleiotropy’ (n = 720). For example, a small deletion on chromosome 1 (chr1:92982441:CA>C) was associated with a highly correlated cluster of NMR measures, including characteristics of intermediate density lipoprotein (IDL), LDL and VLDL particles (Fig. 3c), but for which we detected no correlation of association strengths according to the lead NMR measure, the concentration of esterified cholesterol in medium-sized VLDL particles (P < 6.8 × 10−14). We prioritized EVI5 as the most likely effector gene, supported by previous studies on rare functional variants25. The gene product of EVI5, ecotropic viral integration site 5, has no apparent link to (lipoprotein) metabolism, in line with most of the gene assignments for mQTLs with a similar nonspecific pleiotropy pattern. An example of nonspecific pleiotropy was the APOB missense variant rs676210 (p.Pro2739Leu) associated with 126 NMR measures across the entire lipoprotein density range, but also creatinine and glycoprotein acetyl concentrations (Fig. 3d). The differential effects of the same genetic variation on distinct lipoprotein subgroups aligns with changes in lipid profiles seen with mipomersen, an antisense oligonucleotide against APOB, that demonstrated reductions in LDL cholesterol but also subsequent increases in the triglyceride content of VLDL particles as hepatic adaption occurs26.
Modes of molecular pleiotropy only partially translated into phenotypic pleiotropy (Fig. 3e,f). We observed a twofold enrichment of ‘proportional pleiotropic’ (OR 2.11; P < 2.0 × 10−14) and to a lesser extent an enrichment of ‘nonspecific pleiotropic’ (OR 1.52; P < 1.1 × 10−5) variants among variants reported in the GWAS Catalog for ≥5 nonmetabolomic trait categories (Methods). By contrast, the set of pleiotropic GWAS Catalog variants was significantly depleted for ‘specific’ mQTLs (OR 0.42; P < 1.6 × 10−21). Systemic mechanisms explaining effects of ‘proportional’ and ‘nonspecific’ pleiotropic mQTLs were further indicated by a more than 20-fold significant enrichment of associated trait categories such as ‘metabolic disease’, ‘fatty liver disease’ and ‘arterial disorders’ (Fig. 3g).
Convergence of common and rare genetic variation shaping metabolism
We next sought to understand convergence of rare and common genetic findings to systematically identify allelic series that increase confidence in causal gene assignment. We identified rare variation (MAF ≤0.05%) in 209 genes to be significantly (P < 1.1 × 10−8) linked to one or more of 249 NMR measures combining ultrarare gene burden analysis (3,709 significant associations; Supplementary Table 8) and rare exonic variant analysis (4,131 significant associations; Supplementary Table 9). Effect sizes were significantly larger compared with more frequent variant effects (Fig. 4a). For example, participants carrying rare predicted loss-of-function (LoF) variants in SLC13A5 had more than 1.4 s.d. units higher plasma citrate concentrations per copy of the possibly damaging allele (β = 1.41; P < 2.6 × 10−20).
Fig. 4: Rare coding variation associated with NMR measures and convergence with common variant associations.
a, Effect estimates against MAF of significantly associated gene burden (diamonds; two-sided P < 1.2 × 10−8 and rare exonic variants (MAF <0.05%; circles; two-sided P < 2.0 × 10−10). b, Effect estimates and two-sided raw −log10(P values) for associations of the rare intronic variant chr11:117186662:C>T within SIDT2 across all 249 NMR measures. The dotted horizontal line indicates the multiple testing threshold (P < 2.0 × 10−10). c, Genomic distance between gene burden (blue) or rare exonic variants (orange) toward the next common credible set variant. d, Evidence for allelic series based on (i) gene burden analysis (bottom), (ii) rare exonic variants (middle) and (iii) common variants with prioritized effector gene matching to the evidence from exonic analysis. For each gene, only the NMR measure most significantly associated with the strongest common variant is shown in cases where multiple NMR measures were associated. Some bars for the number of associated rare exonic variants have been capped to fit into plotting margin, but the number is given in the plot. e, Effect estimates (dots) and 95% CIs (lines) from our European-based exWAS for 7 variants mapping to APOA1 as well as a cumulative burden of high-confidence pLOF variants within APOA1 and bespoke circulating measures of ApoA1 (clinical indicates measurements by immunoturbidimetric analysis on a Beckman Coulter AU5800) and HDL particles (color gradient). f, Top: a heatmap of standardized effect estimates (per variant) across 87 NMR measures for each associated variant and a cumulative burden within APOA1. Variants mapping into the region encoding the protein are surrounded by a rectangle. Variant effects have been aligned to the minor allele. Middle: the corresponding variants mapped to their respective transcripts encoding different forms of APOA1. Bottom: missense variants mapped onto the amino acid sequence of the protein. Variant names colored similarly had highly correlated association profiles.
We also observed considerable pleiotropy, including 47 genes associated with 20 or more NMR measures. Many of these genes encode for well-known enzymes and transportes, with nearly half (n = 23/51 genes) being involved in (peripheral) cholesterol metabolism (Extended Data Fig. 8). Some rare pleiotropic variants with large effect sizes (MAF <0.02% and β > 0.6 s.d. units) pointed toward less-established regulators of metabolism, including SIDT2 (chr11:117186662:C>T, n = 124 associated NMR traits), JAK2 (chr9:5073770:G>T (p.Val617Phe), n = 73 associated NMR traits) or CEP164 (chr11:117356670:C>G, n = 49 associated NMR traits). Experimental work already suggested a role for the gene product of SIDT2 (SID1 transmembrane family member 2) in hepatic lipid metabolism and apolipoprotein A1 (ApoA1) secretion, the main protein component of HDL particles, which constituted the majority of associated NMR measures27,28 (Fig. 4b). Variation in JAK2 predisposes to somatic mutations inducing hematopoiesis of indeterminate potential (CHIP)29, but other studies linked the gene product Janus kinase 2 (JAK2) to metabolism in liver30, adipocytes31 or macrophages32. The strong inverse association with parameters of HDL particles thereby best aligned with a role of JAK2 in promoting the interaction with ATP-binding cassette transporter A1 (ABCA1) and subsequent HDL-mediated lipid removal from cells, including atherogenic macrophages32. These findings considerably expanded an earlier hypothesis that attributed effects of the same JAK2 variant on LDL cholesterol primarily to myeloid cells in a mouse model33. This hypothesis only partially aligns with—and in some respects contrasts—our human genetic findings across the lipoprotein-density gradient.
We observed strong overlap between gene burden and common variant findings, with 85.4% of rare variant (n = 3,528) and 75.5% of gene burden (n = 2,802) associations being <100 kb away from the nearest statistically independent lead credible set variant (Fig. 4c). By contrast, most common variant findings (92.3%) were not within 500 kb of matching rare variant/burden evidence. Notably, 12.1% of gene burden results were more than 1 Mb away from the next common credible set variant for the respective NMR measure, aligning with recent observations that both approaches prioritize partly different genes34.
At 116 genes (55.5%), rare variant and/or burden evidence overlapped with effector gene predictions for close by common credible set variants (≤200 kb) for one or more associated NMR measure (Fig. 4d), providing independent support for allelic series (Fig. 4d and Supplementary Table 10). For example, we identified an allelic series composed of seven rare LoF, one gain-of-function and four common variants for serum citrate levels at SLC13A5 encoding a sodium-dependent citrate co-transporter. Another allelic series at ANKH comprised four common variants (rs185448606, MAF 1.3%; rs17250977, MAF 4.0%; rs826351, MAF 44.3%; rs2921604, MAF 45.9%) and a rare missense variant chr5:14745916:T>C (MAF 0.0069%) being also associated with lower serum concentrations of citrate (β = −2.18 s.d. units, P < 5.2 × 10−11) (Fig. 4d). ANKH encodes a multipass transporter, recently shown to transport citrate35, with an important role in bone health35.
Phenotypic heterogeneity within allelic series
We observed evidence that genetic variants within 17 genes associated with >10 NMR measures had differential metabolic consequences within an allelic series (Supplementary Table 10). The most outstanding example included seven variants (five rare; two common) and a cumulative burden of rare predicted LoF variants at APOA1. They distinctively associated with one or more of 87 NMR measures, most strongly with diverse characteristics of HDL particles of which the gene product, Apolipoprotein A1 (ApoA1), is the major component (Fig. 4e,f). This included four rare missense variants (MAF ≤0.03%) encoded in exon 4 that partly differentially associated with the number, size and cholesterol content of HDL particles (Fig. 4e), only one of which (p.Leu158Pro) primarily associated with serum ApoA1 concentrations and HDL particle number, mimicking the cumulative burden of high-confidence predicted LoF variants in APOA1 and suggesting a potentially dysfunctional protein that lacks interaction with lecithin cholesterol acyl transferase to facilitate cholesterol uptake36. By contrast, p.Lys131del and p.Arg201Ser seemed to rather predispose to a shift in cholesterol content from large towards small HDL particles, a pattern opposed by p.Asp113Glu (Fig. 4e). Consistently, amyloid formation by ApoA1 has been observed in early case reports of p.Lys131del (ApoA-IHelsinki37) in which HDL-cholesterol or ApoA1 concentrations are only mildly changed but aggregation of misfolded ApoA1 protein can confer organ damage later in life38. Because p.Asp113Glu and p.Arg201Ser have not yet been identified to cause amyloidosis, we cannot rule out the possibility that each variant maps to distinctive parts of ApoA1 with subsequently different consequences on function and/or stability (Supplementary Fig. 4). While results for serum ApoA1 concentrations were largely confirmed using an alternative assay, we observed some discrepancies that may imply that, in the presence of rare missense variants, the procedure to quantify ApoA1 concentrations from 1H NMR spectra may need recalibration.
Phenotypic consequences of rare variation in metabolic genes
We observed a >3-fold enrichment of genes previously linked to Mendelian diseases39 (‘OMIM genes’) among those associated with NMR measures in gene burden and rare exonic variant analyses (OR 3.30, P < 6.5 × 10−17; Supplementary Table 11), in line with previous mGWAS1,2,7,8. For 15 out of 106 genes, we found evidence of significantly associated disease risk (P < 7.5 × 10−7), largely replicating signs and symptoms of corresponding rare disorders (Supplementary Note and Supplementary Table 12). When we tested more generally whether a rare variant burden in metabolic genes was associated with disease susceptibility, we observed a significant enrichment among susceptibility genes for endocrine and metabolic disorders, such as type 2 diabetes and different lipidemias but not among other disease categories (Supplementary Fig. 5).
Risk mitigation of atherosclerotic CVD beyond LDL cholesterol
Genetic predisposition to high LDL cholesterol is strongly associated with increased atherosclerotic CVD (ACVD) risk (‘level effect’), and genetic variations that mimic potent drug targets, such as at PCSK9, show strong evidence of shared effects on both LDL cholesterol and ACVD (‘locus effect’)40. To identify potential pathways to mitigate the residual risk not addressed by lowering of LDL cholesterol41, we systematically integrated outcome data across 25 CVD phenotypes42,43,44,45,46,47,48,49,50,51,52,53,54,55,56 with NMR phenotypes (Suppl