Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;9(8):e1003671.
doi: 10.1371/journal.pgen.1003671. Epub 2013 Aug 15.

Integrated model of de novo and inherited genetic variants yields greater power to identify risk genes

Affiliations

Integrated model of de novo and inherited genetic variants yields greater power to identify risk genes

Xin He et al. PLoS Genet. 2013.

Abstract

De novo mutations affect risk for many diseases and disorders, especially those with early-onset. An example is autism spectrum disorders (ASD). Four recent whole-exome sequencing (WES) studies of ASD families revealed a handful of novel risk genes, based on independent de novo loss-of-function (LoF) mutations falling in the same gene, and found that de novo LoF mutations occurred at a twofold higher rate than expected by chance. However successful these studies were, they used only a small fraction of the data, excluding other types of de novo mutations and inherited rare variants. Moreover, such analyses cannot readily incorporate data from case-control studies. An important research challenge in gene discovery, therefore, is to develop statistical methods that accommodate a broader class of rare variation. We develop methods that can incorporate WES data regarding de novo mutations, inherited variants present, and variants identified within cases and controls. TADA, for Transmission And De novo Association, integrates these data by a gene-based likelihood model involving parameters for allele frequencies and gene-specific penetrances. Inference is based on a Hierarchical Bayes strategy that borrows information across all genes to infer parameters that would be difficult to estimate for individual genes. In addition to theoretical development we validated TADA using realistic simulations mimicking rare, large-effect mutations affecting risk for ASD and show it has dramatically better power than other common methods of analysis. Thus TADA's integration of various kinds of WES data can be a highly effective means of identifying novel risk genes. Indeed, application of TADA to WES data from subjects with ASD and their families, as well as from a study of ASD subjects and controls, revealed several novel and promising ASD candidate genes with strong statistical support.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Properties of the Multiplicity Test.
(A) The probability a risk gene has two or more de novo LoF mutations in formula image families (i.e., the power) depends on the mutation rate formula image. Power per gene of the Multiplicity Test as a function of formula image is shown for 4 mutation rates, which were chosen based on percentiles (25'th, 50'th, 75'th, 90'th) of the distribution of formula image obtained from the full gene set. (B) The expected number of risk genes discovered by the Multiplicity Test at formula image (red, solid) or 3 (blue, dashed) as a function of the sample size formula image. The barplot shows the FDR at formula image. The simulation assumes 1000 diseases genes out of 18,000, each with relative risk formula image; these parameters were estimated in the section on Genetic Architecture of ASD.
Figure 2
Figure 2. A probabilistic model for a family trio with an affected child.
Genotype probabilities are computed as the marginal probability of parental genotypes times the conditional probability of the child, given the parents. The parameters formula image and formula image represent the mutation rate, and the population frequency of the formula image genotype, respectively. Phenotype probabilities for the child, given genotype, are a function of formula image, the penetrance of the formula image genotype, and formula image the relative risk of the mutation formula image. Rate is the (approximate) rate of observing counts formula image, formula image and formula image from the latter 3 types of trios, respectively.
Figure 3
Figure 3. The genetic parameters of ASD.
(A) The relationship between the number of ASD risk genes (formula image) and the average relative risk (formula image). formula image stands for the total number of genes in the human genome, and formula image for the fold enrichment of the de novo LoF mutations in probands vs. siblings (about 2 in our data). (B) The expected number of multi-hit genes (formula image) in formula image families, as a function of the number of ASD risk genes (formula image). The observed formula image is 5, and we define the plausible range of formula image as the values corresponding to formula image to 6. The model assumes the relative risks of ASD risk genes follow a gamma distribution with the scale parameter formula image. The variance of the relative risk (formula image) across genes equals formula image (formula image is the average of formula image of all ASD risk genes), which limits the range of plausible values for the model. The estimated value of the average formula image is approximately 20. (C) For each gene, we compute the empirical allele frequency (formula image) of LoFs as the number of LoF variants divided by the sample size. The histogram of the LoF frequencies of all genes is shown. Also shown are the estimated distributions of formula image under the null (red, solid line) and the alternative (blue, dashed line) models, respectively.
Figure 4
Figure 4. The power per gene of competing tests.
The results of three tests are shown: novo (red), meta (blue), and TADA (purple). Results are shown for various values of formula image, formula image and formula image with type I error fixed at 0.001. Parameter values are chosen to cover plausible parameter values according to our model estimation: (A) formula image; (B) formula image; and (C) formula image.
Figure 5
Figure 5. Application of TADA to the genetic data of ASD.
(A) De novo LoF and “probably damaging” missense mutations are enriched in ASD probands (red) compared with unaffected siblings (blue), based on a comparison including all trio and quad families. The other types of missense mutations are not enriched. To make the numbers comparable, the number of mutations in siblings is scaled by a constant multiplier (214/124) so that the numbers of silent mutations is equal in probands and in siblings. The annotations of missense mutations are based on PolyPhen. (B) Q-Q plot (log. scale) of the formula image values for all genes in the ASD dataset based on a combined analysis of LoF and severe missense mutations.
Figure 6
Figure 6. Bayesian hierarchical model of TADA.
A fraction formula image of the genes are associated with the phenotype under investigation and follow model formula image, and the remainder follow model formula image. The prior distribution of gene-specific parameters, relative risk (formula image) and allele frequency (formula image), can vary under the competing models, formula image or formula image. Priors are specified by the hyperparameters, formula image and formula image, respectively, which are estimated from the data. Counts of events for the i-th gene follow a Poisson distribution, parameterized by formula image and formula image under formula image, and formula image under formula image.

Similar articles

Cited by

References

    1. Sanders SJ, Murtha MT, Gupta AR, Murdoch JD, Raubeson MJ, et al. (2012) De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485: 237–241. - PMC - PubMed
    1. Neale BM, Kou Y, Liu L, Ma'ayan A, Samocha KE, et al. (2012) Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485: 242–245. - PMC - PubMed
    1. O'Roak BJ, Vives L, Girirajan S, Karakoc E, Krumm N, et al. (2012) Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485: 246–250. - PMC - PubMed
    1. Iossifov I, Ronemus M, Levy D, Wang Z, Hakker I, et al. (2012) De novo gene disruptions in children on the autistic spectrum. Neuron 74: 285–299. - PMC - PubMed
    1. Veltman JA, Brunner HG (2012) De novo mutations in human genetic disease. Nat Rev Genet 13: 565–575. - PubMed

Publication types