Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2017 Jul;49(7):986-992.
doi: 10.1038/ng.3865. Epub 2017 May 22.

Reevaluation of SNP heritability in complex human traits

Affiliations
Comparative Study

Reevaluation of SNP heritability in complex human traits

Doug Speed et al. Nat Genet. 2017 Jul.

Abstract

SNP heritability, the proportion of phenotypic variance explained by SNPs, has been reported for many hundreds of traits. Its estimation requires strong prior assumptions about the distribution of heritability across the genome, but current assumptions have not been thoroughly tested. By analyzing imputed data for a large number of human traits, we empirically derive a model that more accurately describes how heritability varies with minor allele frequency (MAF), linkage disequilibrium (LD) and genotype certainty. Across 19 traits, our improved model leads to estimates of common SNP heritability on average 43% (s.d. 3%) higher than those obtained from the widely used software GCTA and 25% (s.d. 2%) higher than those from the recently proposed extension GCTA-LDMS. Previously, DNase I hypersensitivity sites were reported to explain 79% of SNP heritability; using our improved heritability model, their estimated contribution is only 24%.

PubMed Disclaimer

Conflict of interest statement

Competing Financial Interests

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Comparison of the GCTA and LDAK Models.
Region 1 contains five SNPs in low LD (lighter colors indicate weaker pairwise correlations). Each SNP contributes unique genetic variation, reflected by SNP weights close to one. Region 2 contains five SNPs in high LD (strong correlations). The total genetic variation tagged by the region is effectively captured by two of the SNPs, and so the others receive zero weight. Under the GCTA Model, the regions are expected to contribute heritability proportional to their numbers of SNPs, here equal. Under the LDAK Model, they are expected to contribute proportional to their sums of SNP weights, here in the ratio 4.6:1.9. Note that the expected heritability can also depend on the allele frequencies and genotype certainty of the SNPs, but for simplicity, these factors are ignored here.
Figure 2
Figure 2
(a) Relationship between heritability and MAF. The parameter α specifies the assumed relationship between heritability and MAF: in human genetics, α = –1 is typically used (solid blue line), while in animal and plant genetics, α = 0 is more common (green); we instead found α = –0.25 (red) provides a better fit to real data. The gray bars report (relative) estimates of the per-SNP heritability for MAF<0.1 and MAF>0.1 SNPs, averaged across the 19 GWAS traits (vertical lines provide 95% confidence intervals); the dashed lines indicate the per-SNP heritability predicted by each α. (b) Determining best-fitting α for the GWAS traits. We compare α based on likelihood; higher likelihood indicates better-fitting α. Lines report log likelihoods from LDAK for seven values of α, relative to the highest observed. Line colors indicate the seven trait categories, while the black line reports averages.
Figure 3
Figure 3
(a) Relative estimates of hSNP2 for the GWAS traits. hSNP2 estimates from LDSC, GCTA-MS (SNPs partitioned by MAF), GCTA-LDMS (SNPs partitioned by LD and MAF) and LDAK are reported relative to those from GCTA. For versions of GCTA and LDAK, we use α = –0.25 (see main text for explanation of α). Line colors indicate the seven trait categories; the black line reports the (inverse variance weighted) averages, with gray boxes providing 95% confidence intervals for these averages. Numerical values are provided in Supplementary Table 3. (b) Simulation studies can be misleading. Phenotypes are simulated with 1000 causal SNPs and hSNP2 = 0.8 (black horizontal line), then analyzed using GCTA, GCTA-MS, GCTA-LDMS, LDAK and LDAK-MS (LDAK with SNPs partitioned by MAF). Bars report average hSNP2 across 200 simulated phenotypes (vertical lines provide 95% confidence intervals). Left: copying the study of Yang et al., causal SNP effect sizes are sampled from ℕ(0, 1), similar to the GCTA Model. Right: causal SNP effect sizes are sampled from ℕ(0, wj), similar to the LDAK Model.
Figure 4
Figure 4. Comparing the GCTA and LDAK Models for the GWAS traits:
We partition SNPs into low- or high-LD, with the low-LD tranche containing either 50% (left) or 25% (right) of SNPs. For each partition, the horizontal red and black lines indicate the predicted contribution of the low-LD tranche to hSNP2 under the GCTA and LDAK Models, respectively. Vertical lines provide point estimates and 95% confidence intervals for the contribution of the low-LD tranche to hSNP2, estimated assuming the GCTA Model. Line colors indicate the seven trait categories, while the black lines provide the (inverse variance weighted) averages.
Figure 5
Figure 5. Enrichment of SNP Classes.
Block 1 reports the contributions to hSNP2 of DNaseI hypersensitivity sites (DHS), estimated under the GCTA Model with α = –1 (see main text for explanation of α). The vertical lines provide point estimates and 95% confidence intervals for each trait, and for the (inverse variance weighted) average; for 3 of the traits, the point estimate is above 100%, as was also the case for Gusev et al. Block 2 repeats this analysis, but now assuming the LDAK Model with α = –0.25. Blocks 3 & 4 estimate the contribution of “genic SNPs” (those inside or within 2 kb of an exon) and “inter-genic SNPs” (further than 125 kb from an exon), again assuming the LDAK Model with α = –0.25. To assess enrichment, estimated contributions are compared to those expected under the GCTA or LDAK Model, as appropriate (horizontal lines).
Figure 6
Figure 6. Varying quality control for the UCLEB traits.
We consider three SNP filterings: 353 K high-quality common SNPs (information score > 0.99, MAF > 0.01), 8.8 M common SNPs (MAF > 0.01) and all 17.3 M SNPs (MAF > 0.0005). (a) Blocks indicate SNP filtering; bars report (inverse variance weighted) average estimates of hSNP2 using LDAK (vertical lines provide 95% confidence intervals). Bar color indicates the value of α used. For Blocks 1, 2 & 3, hSNP2 is estimated using the non-partitioned model. For Block 4, SNPs are partitioned by MAF; we find this is necessary when rare SNPs are included, and also allows estimation of the contribution of MAF < 0.01 SNPs (hatched areas). (b) bars report our final estimates of hSNP2 for height, body mass index and QT interval, the three traits for which common SNP heritability has been previously estimated with reasonable precision (orange lines mark the 95% confidence intervals from these previous studies). Bar colors now indicate SNP filtering; all estimates are based on α = –0.25, using either a non-partitioned model (red and blue bars) or with SNPs partitioned by MAF (purple bars).

Similar articles

Cited by

References

    1. Yang J, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42:565–569. - PMC - PubMed
    1. Maher B. Personal genomes: the case of the missing heritability. Nature. 2008;456:18–21. - PubMed
    1. Speed D, et al. Describing the genetic architecture of epilepsy through heritability analysis. Brain. 2014;137:26802689. - PMC - PubMed
    1. Henderson C, Kempthorne O, Searle S, von Krosigk C. The estimation of environmental and genetic trends from records subject to culling. Biometrics. 1959;15:192–218.
    1. Falconer D, Mackay T. Introduction to Quantitative Genetics. 4th Edition. Longman; 1996.

Publication types

Substances