A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures

Reference-based deconvolution methods use reference libraries of cell-specific DNA methylation (DNAm) measurements as a means toward deconvoluting cell proportions in heterogeneous biospecimens (e.g., whole-blood). As the accuracy of such methods depends highly on the CpG loci comprising the referen...

Full description

Bibliographic Details
Published in:Frontiers in Bioinformatics
Main Authors: Shelby Bell-Glenn, Jeffrey A. Thompson, Lucas A. Salas, Devin C. Koestler
Format: Article in Journal/Newspaper
Language:English
Published: Frontiers Media S.A. 2022
Subjects:
DML
Online Access:https://doi.org/10.3389/fbinf.2022.835591
https://doaj.org/article/f90e4cd9e17e4789bc81e581aa90d196
id ftdoajarticles:oai:doaj.org/article:f90e4cd9e17e4789bc81e581aa90d196
record_format openpolar
spelling ftdoajarticles:oai:doaj.org/article:f90e4cd9e17e4789bc81e581aa90d196 2023-05-15T16:02:07+02:00 A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures Shelby Bell-Glenn Jeffrey A. Thompson Lucas A. Salas Devin C. Koestler 2022-03-01T00:00:00Z https://doi.org/10.3389/fbinf.2022.835591 https://doaj.org/article/f90e4cd9e17e4789bc81e581aa90d196 EN eng Frontiers Media S.A. https://www.frontiersin.org/articles/10.3389/fbinf.2022.835591/full https://doaj.org/toc/2673-7647 2673-7647 doi:10.3389/fbinf.2022.835591 https://doaj.org/article/f90e4cd9e17e4789bc81e581aa90d196 Frontiers in Bioinformatics, Vol 2 (2022) reference-based deconvolution IDOL cell heterogeneity cell proportion estimation DNA methylation EWAS Computer applications to medicine. Medical informatics R858-859.7 article 2022 ftdoajarticles https://doi.org/10.3389/fbinf.2022.835591 2022-12-31T03:56:22Z Reference-based deconvolution methods use reference libraries of cell-specific DNA methylation (DNAm) measurements as a means toward deconvoluting cell proportions in heterogeneous biospecimens (e.g., whole-blood). As the accuracy of such methods depends highly on the CpG loci comprising the reference library, recent research efforts have focused on the selection of libraries to optimize deconvolution accuracy. While existing approaches for library selection work extremely well, the best performing approaches require a training data set consisting of both DNAm profiles over a heterogeneous cell population and gold-standard measurements of cell composition (e.g., flow cytometry) in the same samples. Here, we present a framework for reference library selection without a training dataset (RESET) and benchmark it against the Legacy method (minfi:pickCompProbes), where libraries are constructed based on a pre-specified number of cell-specific differentially methylated loci (DML). RESET uses a modified version of the Dispersion Separability Criteria (DSC) for comparing different libraries and has four main steps: 1) identify a candidate set of cell-specific DMLs, 2) randomly sample DMLs from the candidate set, 3) compute the Modified DSC of the selected DMLs, and 4) update the selection probabilities of DMLs based on their contribution to the Modified DSC. Steps 2–4 are repeated many times and the library with the largest Modified DSC is selected for subsequent reference-based deconvolution. We evaluated RESET using several publicly available datasets consisting of whole-blood DNAm measurements with corresponding measurements of cell composition. We computed the RMSE and R2 between the predicted cell proportions and their measured values. RESET outperformed the Legacy approach in selecting libraries that improve the accuracy of deconvolution estimates. Additionally, reference libraries constructed using RESET resulted in cellular composition estimates that explained more variation in DNAm as compared to the Legacy ... Article in Journal/Newspaper DML Directory of Open Access Journals: DOAJ Articles Frontiers in Bioinformatics 2
institution Open Polar
collection Directory of Open Access Journals: DOAJ Articles
op_collection_id ftdoajarticles
language English
topic reference-based deconvolution
IDOL
cell heterogeneity
cell proportion estimation
DNA methylation
EWAS
Computer applications to medicine. Medical informatics
R858-859.7
spellingShingle reference-based deconvolution
IDOL
cell heterogeneity
cell proportion estimation
DNA methylation
EWAS
Computer applications to medicine. Medical informatics
R858-859.7
Shelby Bell-Glenn
Jeffrey A. Thompson
Lucas A. Salas
Devin C. Koestler
A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures
topic_facet reference-based deconvolution
IDOL
cell heterogeneity
cell proportion estimation
DNA methylation
EWAS
Computer applications to medicine. Medical informatics
R858-859.7
description Reference-based deconvolution methods use reference libraries of cell-specific DNA methylation (DNAm) measurements as a means toward deconvoluting cell proportions in heterogeneous biospecimens (e.g., whole-blood). As the accuracy of such methods depends highly on the CpG loci comprising the reference library, recent research efforts have focused on the selection of libraries to optimize deconvolution accuracy. While existing approaches for library selection work extremely well, the best performing approaches require a training data set consisting of both DNAm profiles over a heterogeneous cell population and gold-standard measurements of cell composition (e.g., flow cytometry) in the same samples. Here, we present a framework for reference library selection without a training dataset (RESET) and benchmark it against the Legacy method (minfi:pickCompProbes), where libraries are constructed based on a pre-specified number of cell-specific differentially methylated loci (DML). RESET uses a modified version of the Dispersion Separability Criteria (DSC) for comparing different libraries and has four main steps: 1) identify a candidate set of cell-specific DMLs, 2) randomly sample DMLs from the candidate set, 3) compute the Modified DSC of the selected DMLs, and 4) update the selection probabilities of DMLs based on their contribution to the Modified DSC. Steps 2–4 are repeated many times and the library with the largest Modified DSC is selected for subsequent reference-based deconvolution. We evaluated RESET using several publicly available datasets consisting of whole-blood DNAm measurements with corresponding measurements of cell composition. We computed the RMSE and R2 between the predicted cell proportions and their measured values. RESET outperformed the Legacy approach in selecting libraries that improve the accuracy of deconvolution estimates. Additionally, reference libraries constructed using RESET resulted in cellular composition estimates that explained more variation in DNAm as compared to the Legacy ...
format Article in Journal/Newspaper
author Shelby Bell-Glenn
Jeffrey A. Thompson
Lucas A. Salas
Devin C. Koestler
author_facet Shelby Bell-Glenn
Jeffrey A. Thompson
Lucas A. Salas
Devin C. Koestler
author_sort Shelby Bell-Glenn
title A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures
title_short A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures
title_full A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures
title_fullStr A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures
title_full_unstemmed A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures
title_sort novel framework for the identification of reference dna methylation libraries for reference-based deconvolution of cellular mixtures
publisher Frontiers Media S.A.
publishDate 2022
url https://doi.org/10.3389/fbinf.2022.835591
https://doaj.org/article/f90e4cd9e17e4789bc81e581aa90d196
genre DML
genre_facet DML
op_source Frontiers in Bioinformatics, Vol 2 (2022)
op_relation https://www.frontiersin.org/articles/10.3389/fbinf.2022.835591/full
https://doaj.org/toc/2673-7647
2673-7647
doi:10.3389/fbinf.2022.835591
https://doaj.org/article/f90e4cd9e17e4789bc81e581aa90d196
op_doi https://doi.org/10.3389/fbinf.2022.835591
container_title Frontiers in Bioinformatics
container_volume 2
_version_ 1766397731619733504