Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Sep 9:9:365.
doi: 10.1186/1471-2105-9-365.

Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions

Affiliations

Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions

E Andres Houseman et al. BMC Bioinformatics. .

Abstract

Background: Epigenetics is the study of heritable changes in gene function that cannot be explained by changes in DNA sequence. One of the most commonly studied epigenetic alterations is cytosine methylation, which is a well recognized mechanism of epigenetic gene silencing and often occurs at tumor suppressor gene loci in human cancer. Arrays are now being used to study DNA methylation at a large number of loci; for example, the Illumina GoldenGate platform assesses DNA methylation at 1505 loci associated with over 800 cancer-related genes. Model-based cluster analysis is often used to identify DNA methylation subgroups in data, but it is unclear how to cluster DNA methylation data from arrays in a scalable and reliable manner.

Results: We propose a novel model-based recursive-partitioning algorithm to navigate clusters in a beta mixture model. We present simulations that show that the method is more reliable than competing nonparametric clustering approaches, and is at least as reliable as conventional mixture model methods. We also show that our proposed method is more computationally efficient than conventional mixture model approaches. We demonstrate our method on the normal tissue samples and show that the clusters are associated with tissue type as well as age.

Conclusion: Our proposed recursively-partitioned mixture model is an effective and computationally efficient method for clustering DNA methylation data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Profiles of latent classes among normal tissue samples. Average value (equation 1) depicted by color: yellow = 1.0, black = 0.5, blue = 0.0. Classes are separated by yellow dividing line, with height indicating the relative proportion of subjects within each class. Loci are ordered by their position in a dendrogram obtained via hierarchical clustering.
Figure 2
Figure 2
Unadjusted Average Beta values obtained from Illumina GoldenGate methylation platform for 1413 tumor suppressor loci on 217 normal tissue samples. Yellow = 1.0, black = 0.5, blue = 0.0. Autosomal chromosomes are grouped to aid visualization. For each chromosome group, loci are ordered by their position in a dendrogram produced by hierarchical clustering. Similarly, within tissue sample groups, samples are ordered by their position in a hierarchical clustering dendrogram.
Figure 3
Figure 3
Examples of simulated data. Yellow = 1.0, black = 0.5, blue = 0.0. True classes indicated and separated by yellow dividing line. Height of region indicates the relative number of subjects in each class.

Similar articles

Cited by

References

    1. Russo V, Martienssen RA, Riggs AD. Epigenetic mechanisms of gene regulation. Cold Spring Harbor Laboratory Press; 1996.
    1. Knudson AG. Chasing the cancer demon. Annu Rev Genet. 2000;34:1–19. doi: 10.1146/annurev.genet.34.1.1. - DOI - PubMed
    1. Jones PA, Baylin SB. The fundamental role of epigenetic events in cancer. Nat Rev Genet. 2002;3:415–428. doi: 10.1038/nrg962. - DOI - PubMed
    1. Sakamoto H, Suzuki M, Abe T, Hosoyama T, Himeno E, Tanaka S, Greally JM, Hattori N, Yagi S, Shiota K. Cell type-specific methylation profiles occurring disproportionately in CpG-less regions that delineate developmental similarity. Genes Cells. 2007;12:1123–1132. doi: 10.1111/j.1365-2443.2007.01120.x. - DOI - PubMed
    1. Eckhardt F, Lewin J, Cortese R, Rakyan VK, Attwood J, Burger M, Burton J, Cox TV, Davies R, Down TA, et al. DNA methylation profiling of human chromosomes 6, 20 and 22. Nat Genet. 2006;38:1378–1385. doi: 10.1038/ng1909. - DOI - PMC - PubMed

Publication types