Abstract
While genetic influences on general intelligence have been well documented, less is known about the genetics underlying narrower abilities (“group factors”). By applying structural equation modeling to results from several genome-wide association studies (GWAS), most critically of self-reported math ability (N = 564 698) and highest math class taken (N = 430 445), we identified 53 single-nucleotide polymorphisms (SNPs) associated with a latent trait, orthogonal by design with general intelligence, approximating the group factor of quantitative ability. The genes near these SNPs implicated the biological process of neuron projection development, and the genome-wide pattern of gene-set enrichment affirmed the involvement of brain development and synaptic function…
Abstract
While genetic influences on general intelligence have been well documented, less is known about the genetics underlying narrower abilities (“group factors”). By applying structural equation modeling to results from several genome-wide association studies (GWAS), most critically of self-reported math ability (N = 564 698) and highest math class taken (N = 430 445), we identified 53 single-nucleotide polymorphisms (SNPs) associated with a latent trait, orthogonal by design with general intelligence, approximating the group factor of quantitative ability. The genes near these SNPs implicated the biological process of neuron projection development, and the genome-wide pattern of gene-set enrichment affirmed the involvement of brain development and synaptic function. We calculated a number of genetic correlations with this quantitative factor, finding negative associations with both internalizing and externalizing disorders and positive associations with STEM occupations such as computer programming. These results provide further evidence for genetic influences on traits other than general factors in human behavioral variation, point to the mechanisms mediating these genetic influences on quantitative ability and interests, and affirm the relationships of the latter traits with a number of real-world outcomes.
Scores on different types of intelligence tests are always positively correlated, supporting their treatment as indicators of “general intelligence” (g) [1, 2]. A number of genome-wide association studies (GWAS) have focused on measures of overall intelligence, and because g is associated with the great bulk of the reliable variance in most such measures, these studies can be taken as providing molecular elucidation of mostly g regardless of how explicitly they have targeted this trait [3,4,5,6].
Broad factors other than g are often called “group factors” because each one is measured by a limited group or cluster of tests rather than all tests in general. Several taxonomies of group factors have been proposed [7,8,9,10,11,12]. An important criterion for judging the validity of a posited group factor is whether it provides incremental predictive value with respect to distinctive and important outcomes, and by this benchmark the two factors of verbal comprehension and spatial visualization—recognized in some form by all taxonomists—have proven their utility [13,14,15,16,17,18,19].
Quantitative ability is the capacity to draw logically necessary conclusions regarding number, change, and structure [20]. Some taxonomies classify quantitative ability as a facet of a more general “fluid reasoning” factor or crystallized knowledge. There are many reasons, however, to elevate quantitative ability to a group factor at the same level of prominence as verbal and spatial. A test of quantitative ability is often the best measure of g in its particular battery and furthermore might share little non-g variance with other reasoning tests [21, 22]. Regardless of whether a hierarchical or bifactor method is used in the factor analysis, this pattern embracing a test’s loadings on g and the quantitative factor renders the latter trait of considerable interest. It is also noteworthy that while other reasoning tests may not predict external criteria such as grades particularly well [23,24,25], tests of quantitative ability certainly do exhibit distinctive and important correlates, including success in STEM fields [16, 19, 26,27,28]. The quantitative and spatial factors should not be elided into a generic non-verbal ability, as each factor is associated with a distinct profile of interests, attitudes toward schooling, and vocational aspirations [29].
Twin studies suggest that at least some of the genetic influences on tests of quantitative reasoning or mathematical achievement are not shared with g [30,31,32,33]. In this study we attempted to identify specific sites in the human genome contributing to this unique variation. Such DNA-level analyses can uncover aspects of a trait that must remain obscure in traditional twin and family designs, such as clues to its underlying biological basis and the magnitudes of its associations with other traits that are rarely measured together with it in the same large sample. The observed-level phenotypes in our genome-wide association studies (GWAS) included self-reported math ability and highest math class taken [5]. Users of GWAS summary statistics are often limited to whatever can be practically measured in large-scale studies that are not dedicated to a specific research purpose, but nevertheless we believe that these two unconventional phenotypes should serve our purposes adequately as indicators of quantitative ability. We recognize that these two observed indicators may also reflect an interest in mathematics, distinct from an ability to do mathematics. Self-reported abilities have sometimes been reported to be more highly correlated with matching vocational interests than objectively measured abilities [34, 35], and interest may determine further education in mathematics as much as ability [36,37,38]. But related abilities, personality traits, and interests have often been considered together in previous work, as constellations of correlated and jointly developing attributes [39, 40], and our study might be regarded as a continuation of this approach.
To conduct the latent-level GWAS of the common factor underlying self-reported math ability and highest math class taken (Fig. 1), we turned to Genomic SEM, a software tool for applying factor and path analyses to genetic data [41], and applied a number of downstream analyses to the results.
Fig. 1: Path diagram depicting the standardized factor-analytic model used in this study. Each directed edge from factor to indicator is labeled with its associated factor loading. The residual variances of the indicators and covariances between the factors were fixed rather than estimated. Supplementary Table S1 provides the unstandardized solution. g, general intelligence; Gq, a symbol sometimes used in the literature for “quantitative knowledge” and adopted here to represent our attempt to capture the quantitative factor; NonCog, non-cognitive skills useful for attaining education.
Materials and methods
A more complete description of Materials and Methods can be found in the Supplementary Information.
Observed phenotypes
The GWAS of self-reported math ability (N = 564 698) and highest math class taken (N = 430 445) were conducted exclusively among research participants of 23andMe, Inc. who answered survey questions about their mathematical background. Our structural equation modeling also employed GWAS summary statistics of cognitive performance (CP) and educational attainment (EA) [5].
Self-reported math ability was assessed with the item: How would you rate your mathematical ability? Subjects responded to a 5-point scale ranging from very poor (1) to excellent (5). The mean of this item was 4.71, indicating a ceiling effect in the 23andMe sample. The standard deviation was 0.99. A study employing an extended twin design found that self-reported math ability is at least moderately heritable and correlated with objective ability measurements and grades in math (r**g > 0.7) [42]. Another study found a substantial phenotypic correlation (r > 0.6) between self-reported math ability and an objective achievement test [43]. Self-reported quantitative ability is considerably more correlated with measured quantitative ability than are other self-reported abilities with their corresponding measured abilities [35, 44]. These latter studies reported phenotypic correlations between self-reported numerical ability and scores on objective tests of number-series completion ranging from 0.41 to 0.53. The verbal and spatial correlations between self-report and objective scores ranged from 0.08 to 0.31, and the reported 95% confidence intervals did not overlap that of the numerical correlation.
Confirmatory factor analysis and latent-level GWAS
We used the software tool Genomic SEM [41] to calculate the genetic covariance matrix of the phenotypes studied in the GWAS mentioned above. For this purpose Genomic SEM calls bivariate LDSC [45, 46]. A factor-analytic model was based on the genetic covariance matrix. Since the model in Fig. 1 contains four observed indicators and three common factors, it is not identified without further constraints. We chose as constraints the setting of all indicator residual variances to zero.
A GWAS of the quantitative factor in Fig. 1 was conducted with Genomic SEM. We used GCTA-COJO [47] to identify lead SNPs at a P-value threshold of 5 × 10−8. The COJO SNPs were then tested for a significant QSNP statistic. The threshold P < 5 × 10−8 was chosen as the criterion for classifying a SNP as being associated with the math indicators in a manner inconsistent with acting through the quantitative factor.
We searched the 53 lead SNPs in the NHGRI-EBI GWAS Catalog to check whether they have been associated with other traits by previous GWAS.
Genetic correlations
We used bivariate LDSC to calculate genetic correlations between our quantitative factor and several behavioral, cognitive, psychiatric, and anthropometric traits. In order to establish the discriminant validity of the quantitative factor, we also calculated the genetic correlations with UKB job codes assumed to be related to general and group factors in the cognitive domain.
Polygenic prediction
To convert our GWAS summary statistics of the quantitative factor into weights for polygenic scores (PGS), we first reran Genomic SEM with summary statistics of g [6] in the place of the summary statistics of CP [5]. We did this because our validation sample consisted of the Minnesota Twin Family Study and the Sibling Interaction and Behavior Study, both of which are being conducted by the Minnesota Center for Twin and Family Research (MCTFR) [48], and the MCTFR was a contributor to the meta-analysis of CP. To the resulting new summary statistics of the quantitative factor, we applied the software tool PRS-CS to generate the PGS weights [49].
Each complete unit in the validation sample was made up of two siblings (usually twins) and their parents. The total sample consisted of 9 067 individuals, belonging to 2 497 family units. Details about genotyping and quality control were provided in an earlier paper [50].
Our outcome measure was the third edition of the Wide Range Achievement Test (WRAT). The test has three components: Reading, Spelling, and Arithmetic. The offspring were tested on the WRAT during adolescence, between ages 13 and 21. There were 2 641 genotyped individuals in the offspring generation of European ancestry with available WRAT scores.
We repeated all of our PGS predictions except restricting observations to individuals with genotyped parents and adding the parental PGS as covariates. For a fixed value of the parental PGS, the PGS of the offspring vary randomly as a result of Mendelian segregation and thus provide a strong degree of causal inference [51,52,53,54].
Biological annotation
We used stratified LD Score regression (S-LDSC) [55] to identify the tissues mediating the genetic effects of the SNPs affecting the quantitative factor. We used the precomputed stratified LD Scores for the Genotype-Tissue Expression (GTEx) [56] data supplied by the developers [57].
To prioritize likely causal genes mediating SNP effects, we turned to Polygenic Priority Score (PoPS) [58]. After considering the gene features used in the PoPS paper, we decided to use the same expression data from mice, the human GTEx data, and the reconstituted gene sets employed by the bioinformatic tool DEPICT [59]. Once in hand, the PoPS marginal and partial regression coefficients can be used as a form of gene-set enrichment analysis.
To confirm the robustness of our inferences based on the DEPICT reconstituted gene sets with quantitative membership scores, we turned to two methods relying on the current discrete versions of these gene sets. The first was the PANTHER overrepresentation test, which has been implemented as a web-based tool (https://www.geneontology.org). Our second method was the application of S-LDSC to calculate heritability enrichment.
Results
Genomic factor analysis
Table 1 presents the heritabilities of our indicators and their genetic correlations, as estimated by LD Score regression (LDSC) [45, 60]. The genetic correlations were all quite high, which suggests that much of the noise in an indicator contributed to the environmental term without substantially affecting the heritable component. The two indicators of the quantitative factor were highly correlated with cognitive performance (CP) (r**g > 0.6), highlighting the need to partial out CP from the indicators in order to carry out a GWAS of the residual quantitative factor. Highest math class taken was strongly correlated with educational attainment (EA) (r**g = 0.78), as expected; few high-school dropouts will successfully complete, say, complex analysis. This dependence points to the need to remove the variance shared with EA from highest math class taken to isolate the quantitative factor. The largest genetic correlation was between self-reported math ability and highest math class taken (r**g = 0.84), suggesting that these two variables share a source of variance measured by neither CP nor EA.
With these considerations in mind, we adopted the factor-analytic model shown in Fig. 1. All four indicators were taken as indicators or effects of general intelligence, g. Highest math class taken and EA were taken as indicators of non-cognitive skills (NonCog), here defined to be a composite of all attributes important for success in education that is orthogonal to CP [61]. All estimated loadings handily exceeded the conventional salience threshold of 0.3. The two math indicators loaded nearly as strongly on their group factor as on g or even more so, a pattern usually not observed in bifactor-modeling studies with typical samples and measures [22, 62]. Our pattern of factor loadings may be owed to several features of our study. First, the genetic covariance between self-reported math ability and highest math class taken may have contributions from multiple sources, including not only abilities but interests as well. Second, because we had few indicators of any given factor, we fixed each indicator’s residual genetic variance to zero in order to achieve model identification. As a consequence, CP loaded on no other factor besides g in the model, effectively fixing its factor loading to unity, and this loading may not have been the only one in the model that was somewhat distorted.
Our factor model closely approximated the observed genetic covariance matrix, as evidenced by the fit indices: χ2(2) = 111.82, CFI = 0.996, SRMR = 0.0195.
Multivariate GWAS of the quantitative factor
By including SNP effects in our structural equation model (Supplementary Fig. S1), we were able to conduct a GWAS of the latent quantitative factor. Figure 2 displays the Manhattan plot. The mean χ2 over HapMap3 SNPs was 1.64, comparable to that obtained in previous GWAS yielding strong signals [63,64,65]. The LDSC intercept was 0.99 (s.e. = 0.01) and the attenuation ratio −0.01 (s.e. = 0.01), suggesting negligible bias from population stratification. Our analysis identified 53 SNPs conditionally and jointly associated with the quantitative factor (Supplementary Table S2).
Fig. 2: Genome-wide association study with the latent quantitative factor. Each data point represents −log10(P value) for the regression of the quantitative factor on the SNP whose location is given by the abscissa. Orange circles represent the 53 COJO hits. The one SNP that showed significant QSNP heterogeneity is colored in green. The red dashed line marks the threshold for genome-wide significance (P = 5 × 10−8).
We additionally used Genomic SEM to perform SNP-level tests of heterogeneity. For each lead SNP, we estimated a QSNP statistic to investigate whether the SNP’s associations with the indicators cannot be explained by the mediation of the quantitative factor. Only one lead SNP (rs13107325, a missense variant in SLC39A8) showed significant QSNP heterogeneity (P < 3 × 10−15). This SNP appears to be highly pleiotropic and has been implicated in multiple GWAS of physical and behavioral traits. Its associations with the two indicators of the quantitative factor were estimated to have opposite signs (highest math class taken, Z = − 3.56, P < 4 × 10−4; self-reported math ability, Z = 6.44, P < 2 × 10−10).
We performed a phenome-wide scan of the 53 lead SNPs—plus any SNPs in LD (r2 > 0.6) with them—in the NHGRI-EBI GWAS Catalog to look for associations with other traits (Supplementary Table S3). Of the 53 SNPs, 23 were significantly associated with the main indicator of the quantitative factor, self-reported math ability. From the same set of SNPs, 13 were associated with highest math class taken, 9 with cognitive performance, and 11 with educational attainment. These variants also showed a recurring pattern of associations with other phenotypes. The associations were mainly with traits of the internalizing spectrum (neuroticism, worry, anxiety, major depressive disorder); sleep (insomnia, chronotype, sleep duration); brain-related phenotypes (e.g., cortical surface area); and substance use (e.g., smoking initiation, alcohol use). Our analysis identified 16 novel SNPs, in the sense of not having been previously associated with any of the four indicators or any other cognitive traits. Of the 16 novel SNPs, four were significantly associated with other behavioral traits: rs10863150 with major depressive disorder, rs4459682 with insomnia, rs2572379 with neuroticism and smoking cessation, and rs10515368 with smoking initiation. Another novel SNP, rs57394143, was associated with thalamus volume.
Genetic correlations
Figure 3 shows genetic correlations between the quantitative factor and a wide range of phenotypes. For comparison, we also present genetic correlations between those phenotypes and the two indicators (CP, EA) of other common factors in our model (Fig. 1).
Fig. 3: Genetic correlations of selected phenotypes with the quantitative factor, cognitive performance (CP), and educational attainment (EA). Error bars represent ±1.96-s.e. intervals. The results in numerical form are reported in Supplementary Table S4.
The quantitative factor is uncorrelated with the first principal component of school grades (r**g = −0.02; s.e. = 0.03), which captures a general scholastic ability [66]. This result is reassuring, given that the factor is defined as orthogonal to general cognitive ability and non-cognitive educational skills. It is also correlated negatively with the second principal component of school grades (r**g = −0.80; s.e. = 0.05), an axis that reflects differences between language and math performance (with higher values indicating better language performance and lower values indicating better mathematical performance). There was a small but significantly positive correlation with dyslexia (r**g = 0.08; s.e. = 0.02).
A similar pattern was found when examining associations with job codes from the UK Biobank (UKB). The quantitative factor is correlated with employment as a mathematician (r**g = 0.31; s.e. = 0.11) and software engineer or programmer (r**g = 0.38; s.e. = 0.03). We did not observe any significant associations with being a university professor, medical doctor, life scientist, or psychologist, providing evidence for the discriminant validity of the quantitative factor. We did observe significantly negative correlations with intuitively verbal vocational categories, such as writer/poet (r**g = −0.32; s.e. = 0.09).
There were no significant associations between the quantitative factor and any anthropometric traits, except a small correlation with body mass index (r**g = 0.09; s.e. = 0.02). The quantitative factor is correlated negatively with attention deficit hyperactivity disorder (r**g = −0.10; s.e. = 0.02) and with the general factor of externalizing (r**g = −0.14; s.e. = 0.03). It is also correlated negatively with traits defining internalizing behavior, such as the general factor of neuroticism (r**g = −0.14; s.e. = 0.02) and major depressive disorder (r**g = −0.22; s.e. = 0.02). Like CP and EA, the quantitative factor was correlated negatively with neighborhood deprivation (r**g = −0.14; s.e. = 0.04). However, unlike CP and EA, it was also correlated negatively with autism spectrum disorder (r**g = −0.20; s.e. = 0.05). Although perhaps surprising, this finding was in accord with a recent meta-analysis reporting a negative association between autism spectrum disorder and quantitative reasoning that is not moderated by full-scale IQ [67]. Finally, there was a small but significantly positive association with schizophrenia (r**g = 0.07; s.e. = 0.02).
Polygenic prediction
Polygenic scores (PGS) for the quantitative factor were used to predict student achievement in a holdout sample [48]. The PGS was a positive predictor of arithmetic achievement (β = 0.063; s.e. = 0.025; ΔR2 = 0.39%). In contrast, the PGS happened to have near-zero coefficients in the prediction of reading (β = −0.005; s.e. = 0.025; ΔR2 ≈ 0) and spelling achievement (β = 0.003; s.e. = 0.03; ΔR2 ≈ 0). This pattern of results persisted in the within-family analysis, consistent with an absence of strong confounding bias in the PGS weights. The full results are presented in Supplementary Table S5.
One study found several tests of quantitative ability to range in their loadings on their group factor from 0.29 to 0.46 [22]. It is reasonable to consider the lower end of this range because the elementary problems making up most of WRAT Arithmetic may have a different factorial composition than more advanced tests of quantitative reasoning. If we suppose that 9 percent of the variance in WRAT Arithmetic is attributable to our quantitative factor, that half of this variance is heritable, that half of the heritable variance is attributable to common SNPs, and that our GWAS was sufficiently powered to produce a PGS accounting for half of its potential variance in the limit of infinite sample size, then the PGS should account for something like 1 percent of the variance. If we assume that our estimate of the PGS coefficient is perhaps a standard error too low, then our predictive power does not seem too far off from what we can reasonably expect.
Biological annotation
In many of our analyses, we employed stratified LD Score regression (S-LDSC) to estimate heritability enrichment. It is recommended that S-LDSC be used with a standard collection of control variables. The estimates associated with these variables can be interesting in their own right, and we give them in Supplementary Table S6. The most statistically significant enrichments were shown by annotations referring to evolutionary conservation, a pattern typical of traits that have been studied in GWAS [55, 65]. What the pattern means is that mutations affecting the quantitative factor (and other traits) tend to arise in functional regions of the genome, as evidenced by selection to maintain sequence similarity in distinct lineages, and once arisen may be deleterious.
Figure 4 shows that the top 13 GTEx tissues by heritability enrichment were, without exception, those of the central nervous system. None of the others showed a statistically significant positive enrichment. The two tissues clearing our benchmark effect size of 1.3-fold enrichment were cerebellar hemisphere (P < 10−6) and amygdala (P < 5 × 10−8). The estimated effect sizes of the top tissues were rather close, and not much should be read into their precise ranking. It is perhaps well to regard amygdala skeptically because multiple regression of MAGMA gene-prioritization score [68] on numerous predictors, including expression in other brain regions, left this annotation with a negative weight (Supplementary Table S8). In this latter analysis, frontal cortex, anterior cingulate cortex, cortex, cerebellar hemisphere, cerebellum, and hippocampus all retained positive weights. Overall, there can be no doubt that genetic variation affects the quantitative factor disproportionately through expression in regions of the brain subserving cognition.
Fig. 4: Heritability enrichment of Genotype-Tissue Expression (GTEx) tissues and cell types, as estimated by stratified LD Score regression (S-LDSC) applied to the GWAS summary statistics of the quantitative factor. The error bars are ±1.96-s.e. intervals. The height of the dashed horizontal line corresponds to 1.3-fold enrichment, which we consider to be a “large” effect size. Complete numerical results are given Supplementary Table S7.
To gain further insight into the neural underpinnings of the quantitative factor, we turned to the genes prioritized by the software tool PoPS (Supplementary Table S9] [58]. These genes were significantly overrepresented in the Gene Ontology biological process regulation of neuron projection development (8.4-fold enrichment, P < 10−6; Supplementary Table S10), defined as “any process that modulates the rate, frequency, or extent of … the progression of [axons or dendrites] over time” (https://amigo.geneontology.org/amigo/term/GO:0010975). The members of this set prioritized by PoPS are indicated in Supplementary Table S2. SEMA6D encodes a member of a well-known ligand family that binds to receptors in the growth cones of extending axons, typically repelling the axons toward regions of lower concentration. EFNA5 encodes a member of another well-known ligand family implicated in axon guidance. Curiously, SEMA6D and the gene encoding the receptor of EFNA5 were also among the first genes prioritized for the phenotype of EA [64].
Because the quantitative factor shows no genetic correlation with brain volume (Fig. 3), it is reasonable to suspect that earlier stages of brain development (progenitor proliferation, neurogenesis) play a lesser role in the quantitative factor than they do in CP and EA. This inference fits well with the above findings regarding the development of axons and dendrites.
The above enrichment analysis implicating the growth of axons and dendrites during brain development drew only upon genes near the 53 significant lead SNPs. The most informative features in the PoPS procedure for prioritizing genes can also provide insight and happen to be identified by data from most of the protein-coding genes in the human genome without regard to a threshold of statistical significance in the GWAS. The features with the largest positive weights in the PoPS gene-prioritiziation run tended to be loadings on principal or independent components of gene expression in the mouse brain (Supplementary Table S8). We regard gene sets from databases such as Gene Ontology to be more biologically informative than expression data, and the top features of this type include dendrite (marginal P < 10−15), presynaptic membrane (marginal P < 10−14), and neurotransmitter receptor binding and downstream transmission in the postsynaptic cell (marginal P < 10−13). Interestingly, these results implicated synaptic function in the behaving organism rather than early brain development.
The very top-ranked features may have been dominated by the expression data because of its completeness; the database categories were missing more entries. We reran PoPS with a subset of biologically informative features and clustered the top results. Using the sum of the effect sizes of a cluster’s highly correlated members as the overall effect size, we found synapse part to be the top cluster (Fig. 5), confirming the importance of synaptic communication. The second-ranked cluster was mRNA splicing, perhaps a surprising result. When we used S-LDSC to estimate the heritability enrichment of the related gene sets RNA splicing (1.59-fold enrichment, P < 0.01) and mRNA processing (1.56-fold enrichment, P < 0.01), we obtained support for the PoPS result (Supplementary Table S12). We also note that these gene sets were found to be enriched in an earlier GWAS of EA [5]. It is tempting to speculate that different proteins resulting from transcript variants of the same gene may have distinct effects on a given group factor or specific ability.
Fig. 5: Feature clusters ranked by contribution to the PoP scores of genes. In this analysis