A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures

Reference-based deconvolution methods use reference libraries of cell-specific DNA methylation (DNAm) measurements as a means toward deconvoluting cell proportions in heterogeneous biospecimens (e.g., whole-blood). As the accuracy of such methods depends highly on the CpG loci comprising the referen...

Full description

Bibliographic Details
Published in:Frontiers in Bioinformatics
Main Authors: Bell-Glenn, Shelby, Thompson, Jeffrey A., Salas, Lucas A., Koestler, Devin C.
Other Authors: National Cancer Institute, National Institute of General Medical Sciences
Format: Article in Journal/Newspaper
Language:unknown
Published: Frontiers Media SA 2022
Subjects:
DML
Online Access:http://dx.doi.org/10.3389/fbinf.2022.835591
https://www.frontiersin.org/articles/10.3389/fbinf.2022.835591/full
id crfrontiers:10.3389/fbinf.2022.835591
record_format openpolar
spelling crfrontiers:10.3389/fbinf.2022.835591 2024-03-03T08:43:54+00:00 A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures Bell-Glenn, Shelby Thompson, Jeffrey A. Salas, Lucas A. Koestler, Devin C. National Cancer Institute National Institute of General Medical Sciences 2022 http://dx.doi.org/10.3389/fbinf.2022.835591 https://www.frontiersin.org/articles/10.3389/fbinf.2022.835591/full unknown Frontiers Media SA https://creativecommons.org/licenses/by/4.0/ Frontiers in Bioinformatics volume 2 ISSN 2673-7647 journal-article 2022 crfrontiers https://doi.org/10.3389/fbinf.2022.835591 2024-02-03T23:17:50Z Reference-based deconvolution methods use reference libraries of cell-specific DNA methylation (DNAm) measurements as a means toward deconvoluting cell proportions in heterogeneous biospecimens (e.g., whole-blood). As the accuracy of such methods depends highly on the CpG loci comprising the reference library, recent research efforts have focused on the selection of libraries to optimize deconvolution accuracy. While existing approaches for library selection work extremely well, the best performing approaches require a training data set consisting of both DNAm profiles over a heterogeneous cell population and gold-standard measurements of cell composition (e.g., flow cytometry) in the same samples. Here, we present a framework for reference library selection without a training dataset (RESET) and benchmark it against the Legacy method (minfi:pickCompProbes), where libraries are constructed based on a pre-specified number of cell-specific differentially methylated loci (DML). RESET uses a modified version of the Dispersion Separability Criteria (DSC) for comparing different libraries and has four main steps: 1) identify a candidate set of cell-specific DMLs, 2) randomly sample DMLs from the candidate set, 3) compute the Modified DSC of the selected DMLs, and 4) update the selection probabilities of DMLs based on their contribution to the Modified DSC. Steps 2–4 are repeated many times and the library with the largest Modified DSC is selected for subsequent reference-based deconvolution. We evaluated RESET using several publicly available datasets consisting of whole-blood DNAm measurements with corresponding measurements of cell composition. We computed the RMSE and R 2 between the predicted cell proportions and their measured values. RESET outperformed the Legacy approach in selecting libraries that improve the accuracy of deconvolution estimates. Additionally, reference libraries constructed using RESET resulted in cellular composition estimates that explained more variation in DNAm as compared to the Legacy ... Article in Journal/Newspaper DML Frontiers (Publisher) Frontiers in Bioinformatics 2
institution Open Polar
collection Frontiers (Publisher)
op_collection_id crfrontiers
language unknown
description Reference-based deconvolution methods use reference libraries of cell-specific DNA methylation (DNAm) measurements as a means toward deconvoluting cell proportions in heterogeneous biospecimens (e.g., whole-blood). As the accuracy of such methods depends highly on the CpG loci comprising the reference library, recent research efforts have focused on the selection of libraries to optimize deconvolution accuracy. While existing approaches for library selection work extremely well, the best performing approaches require a training data set consisting of both DNAm profiles over a heterogeneous cell population and gold-standard measurements of cell composition (e.g., flow cytometry) in the same samples. Here, we present a framework for reference library selection without a training dataset (RESET) and benchmark it against the Legacy method (minfi:pickCompProbes), where libraries are constructed based on a pre-specified number of cell-specific differentially methylated loci (DML). RESET uses a modified version of the Dispersion Separability Criteria (DSC) for comparing different libraries and has four main steps: 1) identify a candidate set of cell-specific DMLs, 2) randomly sample DMLs from the candidate set, 3) compute the Modified DSC of the selected DMLs, and 4) update the selection probabilities of DMLs based on their contribution to the Modified DSC. Steps 2–4 are repeated many times and the library with the largest Modified DSC is selected for subsequent reference-based deconvolution. We evaluated RESET using several publicly available datasets consisting of whole-blood DNAm measurements with corresponding measurements of cell composition. We computed the RMSE and R 2 between the predicted cell proportions and their measured values. RESET outperformed the Legacy approach in selecting libraries that improve the accuracy of deconvolution estimates. Additionally, reference libraries constructed using RESET resulted in cellular composition estimates that explained more variation in DNAm as compared to the Legacy ...
author2 National Cancer Institute
National Institute of General Medical Sciences
format Article in Journal/Newspaper
author Bell-Glenn, Shelby
Thompson, Jeffrey A.
Salas, Lucas A.
Koestler, Devin C.
spellingShingle Bell-Glenn, Shelby
Thompson, Jeffrey A.
Salas, Lucas A.
Koestler, Devin C.
A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures
author_facet Bell-Glenn, Shelby
Thompson, Jeffrey A.
Salas, Lucas A.
Koestler, Devin C.
author_sort Bell-Glenn, Shelby
title A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures
title_short A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures
title_full A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures
title_fullStr A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures
title_full_unstemmed A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures
title_sort novel framework for the identification of reference dna methylation libraries for reference-based deconvolution of cellular mixtures
publisher Frontiers Media SA
publishDate 2022
url http://dx.doi.org/10.3389/fbinf.2022.835591
https://www.frontiersin.org/articles/10.3389/fbinf.2022.835591/full
genre DML
genre_facet DML
op_source Frontiers in Bioinformatics
volume 2
ISSN 2673-7647
op_rights https://creativecommons.org/licenses/by/4.0/
op_doi https://doi.org/10.3389/fbinf.2022.835591
container_title Frontiers in Bioinformatics
container_volume 2
_version_ 1792499372246695936