A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures
Reference-based deconvolution methods use reference libraries of cell-specific DNA methylation (DNAm) measurements as a means toward deconvoluting cell proportions in heterogeneous biospecimens (e.g., whole-blood). As the accuracy of such methods depends highly on the CpG loci comprising the referen...
Published in: | Frontiers in Bioinformatics |
---|---|
Main Authors: | , , , |
Format: | Article in Journal/Newspaper |
Language: | English |
Published: |
Frontiers Media S.A.
2022
|
Subjects: | |
Online Access: | https://doi.org/10.3389/fbinf.2022.835591 https://doaj.org/article/f90e4cd9e17e4789bc81e581aa90d196 |
id |
ftdoajarticles:oai:doaj.org/article:f90e4cd9e17e4789bc81e581aa90d196 |
---|---|
record_format |
openpolar |
spelling |
ftdoajarticles:oai:doaj.org/article:f90e4cd9e17e4789bc81e581aa90d196 2023-05-15T16:02:07+02:00 A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures Shelby Bell-Glenn Jeffrey A. Thompson Lucas A. Salas Devin C. Koestler 2022-03-01T00:00:00Z https://doi.org/10.3389/fbinf.2022.835591 https://doaj.org/article/f90e4cd9e17e4789bc81e581aa90d196 EN eng Frontiers Media S.A. https://www.frontiersin.org/articles/10.3389/fbinf.2022.835591/full https://doaj.org/toc/2673-7647 2673-7647 doi:10.3389/fbinf.2022.835591 https://doaj.org/article/f90e4cd9e17e4789bc81e581aa90d196 Frontiers in Bioinformatics, Vol 2 (2022) reference-based deconvolution IDOL cell heterogeneity cell proportion estimation DNA methylation EWAS Computer applications to medicine. Medical informatics R858-859.7 article 2022 ftdoajarticles https://doi.org/10.3389/fbinf.2022.835591 2022-12-31T03:56:22Z Reference-based deconvolution methods use reference libraries of cell-specific DNA methylation (DNAm) measurements as a means toward deconvoluting cell proportions in heterogeneous biospecimens (e.g., whole-blood). As the accuracy of such methods depends highly on the CpG loci comprising the reference library, recent research efforts have focused on the selection of libraries to optimize deconvolution accuracy. While existing approaches for library selection work extremely well, the best performing approaches require a training data set consisting of both DNAm profiles over a heterogeneous cell population and gold-standard measurements of cell composition (e.g., flow cytometry) in the same samples. Here, we present a framework for reference library selection without a training dataset (RESET) and benchmark it against the Legacy method (minfi:pickCompProbes), where libraries are constructed based on a pre-specified number of cell-specific differentially methylated loci (DML). RESET uses a modified version of the Dispersion Separability Criteria (DSC) for comparing different libraries and has four main steps: 1) identify a candidate set of cell-specific DMLs, 2) randomly sample DMLs from the candidate set, 3) compute the Modified DSC of the selected DMLs, and 4) update the selection probabilities of DMLs based on their contribution to the Modified DSC. Steps 2–4 are repeated many times and the library with the largest Modified DSC is selected for subsequent reference-based deconvolution. We evaluated RESET using several publicly available datasets consisting of whole-blood DNAm measurements with corresponding measurements of cell composition. We computed the RMSE and R2 between the predicted cell proportions and their measured values. RESET outperformed the Legacy approach in selecting libraries that improve the accuracy of deconvolution estimates. Additionally, reference libraries constructed using RESET resulted in cellular composition estimates that explained more variation in DNAm as compared to the Legacy ... Article in Journal/Newspaper DML Directory of Open Access Journals: DOAJ Articles Frontiers in Bioinformatics 2 |
institution |
Open Polar |
collection |
Directory of Open Access Journals: DOAJ Articles |
op_collection_id |
ftdoajarticles |
language |
English |
topic |
reference-based deconvolution IDOL cell heterogeneity cell proportion estimation DNA methylation EWAS Computer applications to medicine. Medical informatics R858-859.7 |
spellingShingle |
reference-based deconvolution IDOL cell heterogeneity cell proportion estimation DNA methylation EWAS Computer applications to medicine. Medical informatics R858-859.7 Shelby Bell-Glenn Jeffrey A. Thompson Lucas A. Salas Devin C. Koestler A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures |
topic_facet |
reference-based deconvolution IDOL cell heterogeneity cell proportion estimation DNA methylation EWAS Computer applications to medicine. Medical informatics R858-859.7 |
description |
Reference-based deconvolution methods use reference libraries of cell-specific DNA methylation (DNAm) measurements as a means toward deconvoluting cell proportions in heterogeneous biospecimens (e.g., whole-blood). As the accuracy of such methods depends highly on the CpG loci comprising the reference library, recent research efforts have focused on the selection of libraries to optimize deconvolution accuracy. While existing approaches for library selection work extremely well, the best performing approaches require a training data set consisting of both DNAm profiles over a heterogeneous cell population and gold-standard measurements of cell composition (e.g., flow cytometry) in the same samples. Here, we present a framework for reference library selection without a training dataset (RESET) and benchmark it against the Legacy method (minfi:pickCompProbes), where libraries are constructed based on a pre-specified number of cell-specific differentially methylated loci (DML). RESET uses a modified version of the Dispersion Separability Criteria (DSC) for comparing different libraries and has four main steps: 1) identify a candidate set of cell-specific DMLs, 2) randomly sample DMLs from the candidate set, 3) compute the Modified DSC of the selected DMLs, and 4) update the selection probabilities of DMLs based on their contribution to the Modified DSC. Steps 2–4 are repeated many times and the library with the largest Modified DSC is selected for subsequent reference-based deconvolution. We evaluated RESET using several publicly available datasets consisting of whole-blood DNAm measurements with corresponding measurements of cell composition. We computed the RMSE and R2 between the predicted cell proportions and their measured values. RESET outperformed the Legacy approach in selecting libraries that improve the accuracy of deconvolution estimates. Additionally, reference libraries constructed using RESET resulted in cellular composition estimates that explained more variation in DNAm as compared to the Legacy ... |
format |
Article in Journal/Newspaper |
author |
Shelby Bell-Glenn Jeffrey A. Thompson Lucas A. Salas Devin C. Koestler |
author_facet |
Shelby Bell-Glenn Jeffrey A. Thompson Lucas A. Salas Devin C. Koestler |
author_sort |
Shelby Bell-Glenn |
title |
A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures |
title_short |
A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures |
title_full |
A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures |
title_fullStr |
A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures |
title_full_unstemmed |
A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures |
title_sort |
novel framework for the identification of reference dna methylation libraries for reference-based deconvolution of cellular mixtures |
publisher |
Frontiers Media S.A. |
publishDate |
2022 |
url |
https://doi.org/10.3389/fbinf.2022.835591 https://doaj.org/article/f90e4cd9e17e4789bc81e581aa90d196 |
genre |
DML |
genre_facet |
DML |
op_source |
Frontiers in Bioinformatics, Vol 2 (2022) |
op_relation |
https://www.frontiersin.org/articles/10.3389/fbinf.2022.835591/full https://doaj.org/toc/2673-7647 2673-7647 doi:10.3389/fbinf.2022.835591 https://doaj.org/article/f90e4cd9e17e4789bc81e581aa90d196 |
op_doi |
https://doi.org/10.3389/fbinf.2022.835591 |
container_title |
Frontiers in Bioinformatics |
container_volume |
2 |
_version_ |
1766397731619733504 |