Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2019 Aug 27;11(1):125.
doi: 10.1186/s13148-019-0717-y.

Systematic evaluation and validation of reference and library selection methods for deconvolution of cord blood DNA methylation data

Affiliations
Comparative Study

Systematic evaluation and validation of reference and library selection methods for deconvolution of cord blood DNA methylation data

Kristina Gervin et al. Clin Epigenetics. .

Abstract

Background: Umbilical cord blood (UCB) is commonly used in epigenome-wide association studies of prenatal exposures. Accounting for cell type composition is critical in such studies as it reduces confounding due to the cell specificity of DNA methylation (DNAm). In the absence of cell sorting information, statistical methods can be applied to deconvolve heterogeneous cell mixtures. Among these methods, reference-based approaches leverage age-appropriate cell-specific DNAm profiles to estimate cellular composition. In UCB, four reference datasets comprising DNAm signatures profiled in purified cell populations have been published using the Illumina 450 K and EPIC arrays. These datasets are biologically and technically different, and currently, there is no consensus on how to best apply them. Here, we systematically evaluate and compare these datasets and provide recommendations for reference-based UCB deconvolution.

Results: We first evaluated the four reference datasets to ascertain both the purity of the samples and the potential cell cross-contamination. We filtered samples and combined datasets to obtain a joint UCB reference. We selected deconvolution libraries using two different approaches: automatic selection using the top differentially methylated probes from the function pickCompProbes in minfi and a standardized library selected using the IDOL (Identifying Optimal Libraries) iterative algorithm. We compared the performance of each reference separately and in combination, using the two approaches for reference library selection, and validated the results in an independent cohort (Generation R Study, n = 191) with matched Fluorescence-Activated Cell Sorting measured cell counts. Strict filtering and combination of the references significantly improved the accuracy and efficiency of cell type estimates. Ultimately, the IDOL library outperformed the library from the automatic selection method implemented in pickCompProbes.

Conclusion: These results have important implications for epigenetic studies in UCB as implementing this method will optimally reduce confounding due to cellular heterogeneity. This work provides guidelines for future reference-based UCB deconvolution and establishes a framework for combining reference datasets in other tissues.

Keywords: Cell type heterogeneity; DNAm; Deconvolution; IDOL; Reference dataset; Umbilical cord blood; minfi; pickCompProbes.

PubMed Disclaimer

Conflict of interest statement

KTK and JKW are founders of Celintec, which provided no funding and had no role in this work. The other authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
PCA scatterplot of cell type-specific DNAm in four UCB references as published (raw). The two first principal components are plotted with the proportion of variance explained by each component indicated next to the axis labels. The plot clearly shows distinct clustering of the different cell types and most of the variance in DNAm can be attributed to the different cell types. Of note, nRBCs are not included in the Gervin and Lin references
Fig. 2
Fig. 2
Data filtering using a projection of adult cell types. Samples in the four UCB references showing  70% of a different cell type were reclassified to the corresponding cell type. Using a 70% cut-off resulted in removal of 24 samples (26.9%, indicated by red asterisk) and reclassification of three samples (indicated by green asterisk). Of note, the majority of the CD8T cell fractions in the Bakulski reference showed a large proportion of NK cells
Fig. 3
Fig. 3
Evaluation of libraries. The selected libraries from pickCompProbes and IDOL were evaluated by calculating the R2 and RMSE comparing estimates and FACS counts from each cell type in the test dataset (n = 22) using individual and combined UCB references. Mean R2 and RMSE are plotted on the y- and x-axes, respectively
Fig. 4
Fig. 4
Comparison of L-DMR libraries selected using automatic selection in pickCompProbes and the IDOL algorithm for optimization. a L-DMR libraries selected from combined UCB reference (raw n = 666 and filtered n = 662) using automatic selection in pickCompProbes and IDOL (n = 517). b Overlapping of probes from the three methods
Fig. 5
Fig. 5
Comparison of estimated cell types and matched FACS cell counts. Scatter plots of deconvolution estimates using CP/QP programming and matched FACS cell counts in an individual birth cohort (Generation R, n = 191) using cleaned IDOL and pickCompProbes libraries and the combined UCB reference. Smoothing lines represent the linear model. R2 and RMSE using the two methods are indicated for each cell type
Fig. 6
Fig. 6
Measurements of accuracy and agreement between methods. a Box plots of FACS cell counts (red) and estimates generated using IDOL (blue) and pickCompProbes (green) and a combined UCB reference (raw and filtered). b Absolute errors (estimates minus FACS counts) by deconvolution method and the combined UCB reference (filtered and raw). c Bland-Altman plots (differences versus means) showing the agreement between IDOL and pickCompProbes using a filtered combined UCB reference. The mean difference per method (blue and green) and zero difference (red) are indicated by horizontal lines

Similar articles

Cited by

References

    1. Hannon E, Schendel D, Ladd-Acosta C, Grove J, iPSYCH-Broad ASD Group. Hansen CS, et al. Elevated polygenic burden for autism is associated with differential DNA methylation at birth. Genome Med. 2018;10:19. doi: 10.1186/s13073-018-0527-4. - DOI - PMC - PubMed
    1. Felix JF, Joubert BR, Baccarelli AA, Sharp GC, Almqvist C, Annesi-Maesano I, et al. Cohort Profile: Pregnancy And Childhood Epigenetics (PACE) Consortium. Int J Epidemiol. 2017. - PMC - PubMed
    1. Gervin K, Nordeng H, Ystrom E, Reichborn-Kjennerud T, Lyle R. Long-term prenatal exposure to paracetamol is associated with DNA methylation differences in children diagnosed with ADHD. Clin Epigenetics. 2017;9:345. doi: 10.1186/s13148-017-0376-9. - DOI - PMC - PubMed
    1. Bakulski KM, Halladay A, Hu VW, Mill J, Fallin MD. Epigenetic Research in Neuropsychiatric Disorders: the “Tissue Issue.”. Curr Behav Neurosci Rep. 2016;3:264–274. doi: 10.1007/s40473-016-0083-4. - DOI - PMC - PubMed
    1. McCarthy JM, Capullari T, Thompson Z, Zhu Y, Spellacy WN. Umbilical cord nucleated red blood cell counts: normal values and the effect of labor. J Perinatol. 2006;26:89–92. doi: 10.1038/sj.jp.7211437. - DOI - PubMed

Publication types