Improving cell mixture deconvolution by identifying optimal DNA methylation libraries (IDOL)

doi:10.1186/s12859-016-0943-7

. 2016 Mar 8:17:120.

doi: 10.1186/s12859-016-0943-7.

Improving cell mixture deconvolution by identifying optimal DNA methylation libraries (IDOL)

Devin C Koestler¹, Meaghan J Jones², Joseph Usset³, Brock C Christensen^{4

5

6}, Rondi A Butler⁷, Michael S Kobor⁸, John K Wiencke⁹, Karl T Kelsey^{10

11}

Affiliations

¹ Department of Biostatistics, University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, 66160, KS, USA. [email protected].
² Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, The University of British Columbia, 950 West 28th Ave., Vancouver, V5Z 4H4, BC, Canada. [email protected].
³ Department of Biostatistics, University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, 66160, KS, USA. [email protected].
⁴ Department of Epidemiology, Geisel School of Medicine, Dartmouth College, 1 Medical Center Dr., Lebanon, 03756, NH, USA. [email protected].
⁵ Department of Pharmacology and Toxicology, Dartmouth College, 1 Rope Ferry Rd., Hanover, 03755, NH, USA. [email protected].
⁶ Department of Community and Family Medicine, Geisel School of Medicine, Dartmouth College, 1 Medical Center Dr., Lebanon, 03756, NH, USA. [email protected].
⁷ Department of Pathology and Laboratory Medicine, Brown University, 70 Ship St., Providence, 02912, RI, USA. [email protected].
⁸ Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, The University of British Columbia, 950 West 28th Ave., Vancouver, V5Z 4H4, BC, Canada. [email protected].
⁹ Department of Neurological Surgery, University of California San Francisco, 505 Parnassus Ave., San Francisco, 94143, CA, USA. [email protected].
¹⁰ Department of Pathology and Laboratory Medicine, Brown University, 70 Ship St., Providence, 02912, RI, USA. [email protected].
¹¹ Department of Epidemiology, Brown University, 121 South Main St., Providence, 02912, RI, USA. [email protected].

PMID: 26956433
PMCID: PMC4782368
DOI: 10.1186/s12859-016-0943-7

Improving cell mixture deconvolution by identifying optimal DNA methylation libraries (IDOL)

Devin C Koestler et al. BMC Bioinformatics. 2016.

. 2016 Mar 8:17:120.

doi: 10.1186/s12859-016-0943-7.

Authors

Devin C Koestler¹, Meaghan J Jones², Joseph Usset³, Brock C Christensen^{4

5

6}, Rondi A Butler⁷, Michael S Kobor⁸, John K Wiencke⁹, Karl T Kelsey^{10

11}

Affiliations

¹ Department of Biostatistics, University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, 66160, KS, USA. [email protected].
² Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, The University of British Columbia, 950 West 28th Ave., Vancouver, V5Z 4H4, BC, Canada. [email protected].
³ Department of Biostatistics, University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, 66160, KS, USA. [email protected].
⁴ Department of Epidemiology, Geisel School of Medicine, Dartmouth College, 1 Medical Center Dr., Lebanon, 03756, NH, USA. [email protected].
⁵ Department of Pharmacology and Toxicology, Dartmouth College, 1 Rope Ferry Rd., Hanover, 03755, NH, USA. [email protected].
⁶ Department of Community and Family Medicine, Geisel School of Medicine, Dartmouth College, 1 Medical Center Dr., Lebanon, 03756, NH, USA. [email protected].
⁷ Department of Pathology and Laboratory Medicine, Brown University, 70 Ship St., Providence, 02912, RI, USA. [email protected].
⁸ Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, The University of British Columbia, 950 West 28th Ave., Vancouver, V5Z 4H4, BC, Canada. [email protected].
⁹ Department of Neurological Surgery, University of California San Francisco, 505 Parnassus Ave., San Francisco, 94143, CA, USA. [email protected].
¹⁰ Department of Pathology and Laboratory Medicine, Brown University, 70 Ship St., Providence, 02912, RI, USA. [email protected].
¹¹ Department of Epidemiology, Brown University, 121 South Main St., Providence, 02912, RI, USA. [email protected].

PMID: 26956433
PMCID: PMC4782368
DOI: 10.1186/s12859-016-0943-7

Abstract

Background: Confounding due to cellular heterogeneity represents one of the foremost challenges currently facing Epigenome-Wide Association Studies (EWAS). Statistical methods leveraging the tissue-specificity of DNA methylation for deconvoluting the cellular mixture of heterogenous biospecimens offer a promising solution, however the performance of such methods depends entirely on the library of methylation markers being used for deconvolution. Here, we introduce a novel algorithm for Identifying Optimal Libraries (IDOL) that dynamically scans a candidate set of cell-specific methylation markers to find libraries that optimize the accuracy of cell fraction estimates obtained from cell mixture deconvolution.

Results: Application of IDOL to training set consisting of samples with both whole-blood DNA methylation data (Illumina HumanMethylation450 BeadArray (HM450)) and flow cytometry measurements of cell composition revealed an optimized library comprised of 300 CpG sites. When compared existing libraries, the library identified by IDOL demonstrated significantly better overall discrimination of the entire immune cell landscape (p = 0.038), and resulted in improved discrimination of 14 out of the 15 pairs of leukocyte subtypes. Estimates of cell composition across the samples in the training set using the IDOL library were highly correlated with their respective flow cytometry measurements, with all cell-specific R (2)>0.99 and root mean square errors (RMSEs) ranging from [0.97 % to 1.33 %] across leukocyte subtypes. Independent validation of the optimized IDOL library using two additional HM450 data sets showed similarly strong prediction performance, with all cell-specific R (2)>0.90 and R M S E<4.00 %. In simulation studies, adjustments for cell composition using the IDOL library resulted in uniformly lower false positive rates compared to competing libraries, while also demonstrating an improved capacity to explain epigenome-wide variation in DNA methylation within two large publicly available HM450 data sets.

Conclusions: Despite consisting of half as many CpGs compared to existing libraries for whole blood mixture deconvolution, the optimized IDOL library identified herein resulted in outstanding prediction performance across all considered data sets and demonstrated potential to improve the operating characteristics of EWAS involving adjustments for cell distribution. In addition to providing the EWAS community with an optimized library for whole blood mixture deconvolution, our work establishes a systematic and generalizable framework for the assembly of libraries that improve the accuracy of cell mixture deconvolution.

PubMed Disclaimer

Figures

**Fig. 1**
Impact of L-DMR library on the accuracy of cell composition estimation. a, b Hierarchical clustering heat maps of the mean methylation signatures of isolated leukocyte subtypes [3] using (a) the top 600 ANOVA-ranked L-DMRs (TopANOVA library) and (b) the 600 L-DMRs that uniquely distinguish each cell type from all other cell types (EstimateCellCounts default library). Column dendrograms are colored to reflect the cell-lineage of leukocyte subtypes: lymphocytes (*pink*) and myeloid-derived cells (*blue*). c Image plot showing the difference in the dispersion separability criterion (DSC) between the EstimateCellCounts and TopANOVA libraries. For a given pair of leukocyte subtypes, larger values of DSC difference (shades of blue) indicate better discrimination associated with the EstimateCellCounts library, whereas smaller values of DSC difference (shades of red) indicate better discrimination associated with the TopANOVA library. d Scatterplots of the CMD predicted and FACS cell fractions for the n=6 AdultMixed samples. Dashed lines indicate the line of unity, dotted lines represent the fitted regression lines based on cell predictions obtained using the TopANOVA library, and solid lines represent the fitted regression lines based on cell predictions obtained using the EstimateCellCounts library. e Cell-specific prediction performance for the AdultMixed samples based on the TopANOVA and EstimateCellCounts libraries

**Fig. 2**
Conceptual illustration of the IDOL algorithm. a Schematic diagram showing each step of IDOL. b, c Illustration of the scheme for updating the selection probabilities of L-DMRs. d Conceptual depiction of the L-DMR selection probabilities as a function of the sequential progression of IDOL. At iteration 0, L-DMRs have an equal probability of being selected for inclusion in the randomly assembled L-DMR subset. At each sequential iteration of IDOL (i.e., moving from left to right), the selection probabilities for L-DMRs are updated in a manner proportion to their contribution to prediction performance; selection probabilities for L-DMRs that contribute favorably to prediction performance are increased (increasing shades of green), whereas the selection probabilities for those that hinder prediction performance are decreased (increasing shades of red). Upon algorithm termination, the J ^⋆ L-DMRs with the largest selection probabilities are taken to represent the optimal L-DMR library. e, f Plots showing mean *RMSE* ( $\bar{M}$ ) and coefficient of determination ( ${\bar{R}}^{2}$ ) respectively, as a function of sequential progression of the the IDOL algorithm

**Fig. 3**
Results obtained from applying IDOL to the training set. a Stacked bar plots showing the FACS measured fractions of granulocytes (Gran), monocytes (Mono), natural-killer cell (NK), B cells (Bcell), CD8T lymphocytes (CD8T), and CD4T lymphocytes (CD4T) across the 6 training samples. b Hierarchical clustering heat map of the mean methylation signature of leukocyte cell-types (columns) based on the 300 optimized L-DMRs (rows) identified by IDOL. The column dendrogram is colored to reflect the cell lineage of the leukocyte subtypes, where lymphocyte-derived subtypes are colored pink and myeloid-derived cell types are colored blue. c Scatterplots of FACS measured cell fractions (x-axes) and predicted cell proportions obtained using the optimized IDOL library (y-axes). Dotted lines indicate the line of unity and colored lines represent the regression line fit to the FACS measured cell fractions and predicted cell fractions. d Overlap between IDOL and EstimateCellCounts libraries. e Image plot showing the difference in the dispersion separability criterion (DSC) between the IDOL and EstimateCellCounts libraries for discriminating specific pairs of leukocyte subtypes. For a given pair of leukocytes, larger values of DSC difference (shades of blue) indicate better discrimination associated with the IDOL library, whereas smaller values of DSC difference (shades of red) indicate better discrimination associated with the EstimateCellCounts library. f Histogram showing the results of a permutation-based testing procedure for examining the difference in the overall DSC between the IDOL and EstimateCellCounts libraries

**Fig. 4**
Results obtained from applying the optimal IDOL library to the testing sets. a Stacked bar plots showing the cell type fractions for each testing set sample. b Scatter plots of the true reconstructed mixture fractions (x-axes) and the predicted cell fractions obtained using the optimized IDOL library (y-axes). Circles indicate Method A samples and squares indicate Method B samples. Dotted lines indicate the line of unity and colored lines represent the regression line fit to the true reconstructed mixture fractions and predicted cell fractions. c Box plots showing the predicted cell (%) − observed cell (%) across leukocyte cell types, where blue boxes represent estimates obtained from the optimal IDOL library and red boxes represent estimates obtained from the EstimateCellCounts library. (d, top panel) Estimated false discovery rate (FDR) for a two-group comparison of DNA methylation as a function of the dissimilarity in the cellular distribution between groups (x-axes). Colored lines represent different approaches for cell composition adjustment. (d, bottom panel) Difference in the FDR between the EstimateCellCounts and IDOL libraries where points above the dotted line indicate that the EstimateCellCounts library resulted in more false positive results compared to the IDOL library. e Mean difference in the FDR for varying sample sizes when cell mixture was adjusted using cell fractions estimates from the EstimateCellCounts and IDOL libraries. Bars represent the 95 % bootstrap confidence intervals for each point estimate. Points to the right of the dotted line indicate that the EstimateCellCounts library resulted in more false positive results compared to the IDOL librarys

**Fig. 5**
Cell mixture deconvolution of the Liu and Hannum blood data sets using the IDOL and EstimateCellCounts libraries. a, b Scatter plots of the predicted cell type fractions obtained using EstimateCellCounts library (x-axes) and the IDOL library (y-axes) for the Liu and the Hannum data sets, respectively. c, d Distribution of the difference in the R ² computed from the IDOL and EstimateCellCounts libraries for the (c) Liu and (d) Hannum data sets. e, f Estimated number of additional samples needed (y-axis, left) and approximate additional cost (y-axis, right) as a function of the desired difference in DNA methylation to be detected (x-axis) when correction for cell mixture was carried out using the EstimateCellCounts library. Variance estimates were obtained from the (e) Liu and (d) Hannum data sets

See this image and copyright information in PMC

Cited by

Periconceptional folate intake influences DNA methylation at birth based on dietary source in an analysis of pediatric acute lymphoblastic leukemia cases and controls.
Nickels EM, Li S, Morimoto L, Kang AY, de Smith AJ, Metayer C, Wiemels JL. Nickels EM, et al. Am J Clin Nutr. 2022 Dec 19;116(6):1553-1564. doi: 10.1093/ajcn/nqac283. Am J Clin Nutr. 2022. PMID: 36178055 Free PMC article.
Impact of folic acid supplementation on the epigenetic profile in healthy unfortified individuals - a randomized intervention trial.
Michels KB, Binder AM. Michels KB, et al. Epigenetics. 2024 Dec;19(1):2293410. doi: 10.1080/15592294.2023.2293410. Epub 2023 Dec 14. Epigenetics. 2024. PMID: 38096372 Free PMC article. Clinical Trial.
Editorial: Computational Methods for Analysis of DNA Methylation Data.
Di Lena P, Nardini C, Pellegrini M. Di Lena P, et al. Front Bioinform. 2022 Jun 17;2:926066. doi: 10.3389/fbinf.2022.926066. eCollection 2022. Front Bioinform. 2022. PMID: 36304337 Free PMC article. No abstract available.
Cell type-specific DNA methylation in neonatal cord tissue and cord blood: a 850K-reference panel and comparison of cell types.
Lin X, Tan JYL, Teh AL, Lim IY, Liew SJ, MacIsaac JL, Chong YS, Gluckman PD, Kobor MS, Cheong CY, Karnani N. Lin X, et al. Epigenetics. 2018;13(9):941-958. doi: 10.1080/15592294.2018.1522929. Epub 2018 Oct 11. Epigenetics. 2018. PMID: 30232931 Free PMC article.
An optimized library for reference-based deconvolution of whole-blood biospecimens assayed using the Illumina HumanMethylationEPIC BeadArray.
Salas LA, Koestler DC, Butler RA, Hansen HM, Wiencke JK, Kelsey KT, Christensen BC. Salas LA, et al. Genome Biol. 2018 May 29;19(1):64. doi: 10.1186/s13059-018-1448-7. Genome Biol. 2018. PMID: 29843789 Free PMC article.

See all "Cited by" articles

References

1. Rakyan VK, Down TA, Balding DJ, Beck S. Epigenome-wide association studies for common human diseases. Nat Rev Genet. 2011;12(8):529–41. doi: 10.1038/nrg3000. - DOI - PMC - PubMed
1. Adalsteinsson BT, Gudnason H, Aspelund T, Harris TB, Launer LJ, Eiriksdottir G, Smith AV, Gudnason V. Heterogeneity in white blood cells has potential to confound dna methylation measurements. PLoS ONE. 2012;7(10):46705. doi: 10.1371/journal.pone.0046705. - DOI - PMC - PubMed
1. Reinius LE, Acevedo N, Joerink M, Pershagen G, Dahln SE, Greco D, Sderhll C, Scheynius A, Kere J. Differential dna methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility. PLoS ONE. 2012;7(7):41361. doi: 10.1371/journal.pone.0041361. - DOI - PMC - PubMed
1. Koestler DC, Marsit CJ, Christensen BC, Accomando W, Langevin SM, Houseman EA, Nelson HH, Karagas MR, Wiencke JK, Kelsey KT. Peripheral blood immune cell methylation profiles are associated with nonhematopoietic cancers. Cancer Epidemiol Biomarkers Prev. 2012;21(8):1293–302. doi: 10.1158/1055-9965.EPI-12-0361. - DOI - PMC - PubMed
1. Lam LL, Emberly E, Fraser HB, Neumann SM, Chen E, Miller GE, Kobor MS. Factors underlying variable dna methylation in a human community cohort. Proc Natl Acad Sci U S A. 2012;109 Suppl 2:17253–60. doi: 10.1073/pnas.1121249109. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

[1] Rakyan VK, Down TA, Balding DJ, Beck S. Epigenome-wide association studies for common human diseases. Nat Rev Genet. 2011;12(8):529–41. doi: 10.1038/nrg3000. - DOI - PMC - PubMed

[2] Rakyan VK, Down TA, Balding DJ, Beck S. Epigenome-wide association studies for common human diseases. Nat Rev Genet. 2011;12(8):529–41. doi: 10.1038/nrg3000. - DOI - PMC - PubMed

[3] Adalsteinsson BT, Gudnason H, Aspelund T, Harris TB, Launer LJ, Eiriksdottir G, Smith AV, Gudnason V. Heterogeneity in white blood cells has potential to confound dna methylation measurements. PLoS ONE. 2012;7(10):46705. doi: 10.1371/journal.pone.0046705. - DOI - PMC - PubMed

[4] Adalsteinsson BT, Gudnason H, Aspelund T, Harris TB, Launer LJ, Eiriksdottir G, Smith AV, Gudnason V. Heterogeneity in white blood cells has potential to confound dna methylation measurements. PLoS ONE. 2012;7(10):46705. doi: 10.1371/journal.pone.0046705. - DOI - PMC - PubMed

[5] Reinius LE, Acevedo N, Joerink M, Pershagen G, Dahln SE, Greco D, Sderhll C, Scheynius A, Kere J. Differential dna methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility. PLoS ONE. 2012;7(7):41361. doi: 10.1371/journal.pone.0041361. - DOI - PMC - PubMed

[6] Reinius LE, Acevedo N, Joerink M, Pershagen G, Dahln SE, Greco D, Sderhll C, Scheynius A, Kere J. Differential dna methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility. PLoS ONE. 2012;7(7):41361. doi: 10.1371/journal.pone.0041361. - DOI - PMC - PubMed

[7] Koestler DC, Marsit CJ, Christensen BC, Accomando W, Langevin SM, Houseman EA, Nelson HH, Karagas MR, Wiencke JK, Kelsey KT. Peripheral blood immune cell methylation profiles are associated with nonhematopoietic cancers. Cancer Epidemiol Biomarkers Prev. 2012;21(8):1293–302. doi: 10.1158/1055-9965.EPI-12-0361. - DOI - PMC - PubMed

[8] Koestler DC, Marsit CJ, Christensen BC, Accomando W, Langevin SM, Houseman EA, Nelson HH, Karagas MR, Wiencke JK, Kelsey KT. Peripheral blood immune cell methylation profiles are associated with nonhematopoietic cancers. Cancer Epidemiol Biomarkers Prev. 2012;21(8):1293–302. doi: 10.1158/1055-9965.EPI-12-0361. - DOI - PMC - PubMed

[9] Lam LL, Emberly E, Fraser HB, Neumann SM, Chen E, Miller GE, Kobor MS. Factors underlying variable dna methylation in a human community cohort. Proc Natl Acad Sci U S A. 2012;109 Suppl 2:17253–60. doi: 10.1073/pnas.1121249109. - DOI - PMC - PubMed

[10] Lam LL, Emberly E, Fraser HB, Neumann SM, Chen E, Miller GE, Kobor MS. Factors underlying variable dna methylation in a human community cohort. Proc Natl Acad Sci U S A. 2012;109 Suppl 2:17253–60. doi: 10.1073/pnas.1121249109. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Improving cell mixture deconvolution by identifying optimal DNA methylation libraries (IDOL)

Affiliations

Improving cell mixture deconvolution by identifying optimal DNA methylation libraries (IDOL)

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases