Using deep learning to identify recent positive selection in malaria parasite sequence data

Abstract Background Malaria, caused by Plasmodium parasites, is a major global public health problem. To assist an understanding of malaria pathogenesis, including drug resistance, there is a need for the timely detection of underlying genetic mutations and their spread. With the increasing use of w...

Full description

Bibliographic Details
Published in:Malaria Journal
Main Authors: Wouter Deelder, Ernest Diez Benavente, Jody Phelan, Emilia Manko, Susana Campino, Luigi Palla, Taane G. Clark
Format: Article in Journal/Newspaper
Language:English
Published: BMC 2021
Subjects:
Online Access:https://doi.org/10.1186/s12936-021-03788-x
https://doaj.org/article/932d1e5320274ee9b76ff0a36efc77a6
id ftdoajarticles:oai:doaj.org/article:932d1e5320274ee9b76ff0a36efc77a6
record_format openpolar
spelling ftdoajarticles:oai:doaj.org/article:932d1e5320274ee9b76ff0a36efc77a6 2023-05-15T15:12:31+02:00 Using deep learning to identify recent positive selection in malaria parasite sequence data Wouter Deelder Ernest Diez Benavente Jody Phelan Emilia Manko Susana Campino Luigi Palla Taane G. Clark 2021-06-01T00:00:00Z https://doi.org/10.1186/s12936-021-03788-x https://doaj.org/article/932d1e5320274ee9b76ff0a36efc77a6 EN eng BMC https://doi.org/10.1186/s12936-021-03788-x https://doaj.org/toc/1475-2875 doi:10.1186/s12936-021-03788-x 1475-2875 https://doaj.org/article/932d1e5320274ee9b76ff0a36efc77a6 Malaria Journal, Vol 20, Iss 1, Pp 1-9 (2021) Plasmodium falciparum Plasmodium vivax Population genomics Drug resistance Machine learning Positive selection Arctic medicine. Tropical medicine RC955-962 Infectious and parasitic diseases RC109-216 article 2021 ftdoajarticles https://doi.org/10.1186/s12936-021-03788-x 2022-12-31T12:16:38Z Abstract Background Malaria, caused by Plasmodium parasites, is a major global public health problem. To assist an understanding of malaria pathogenesis, including drug resistance, there is a need for the timely detection of underlying genetic mutations and their spread. With the increasing use of whole-genome sequencing (WGS) of Plasmodium DNA, the potential of deep learning models to detect loci under recent positive selection, historically signals of drug resistance, was evaluated. Methods A deep learning-based approach (called “DeepSweep”) was developed, which can be trained on haplotypic images from genetic regions with known sweeps, to identify loci under positive selection. DeepSweep software is available from https://github.com/WDee/Deepsweep . Results Using simulated genomic data, DeepSweep could detect recent sweeps with high predictive accuracy (areas under ROC curve > 0.95). DeepSweep was applied to Plasmodium falciparum (n = 1125; genome size 23 Mbp) and Plasmodium vivax (n = 368; genome size 29 Mbp) WGS data, and the genes identified overlapped with two established extended haplotype homozygosity methods (within-population iHS, across-population Rsb) (~ 60–75% overlap of hits at P < 0.0001). DeepSweep hits included regions proximal to known drug resistance loci for both P. falciparum (e.g. pfcrt, pfdhps and pfmdr1) and P. vivax (e.g. pvmrp1). Conclusion The deep learning approach can detect positive selection signatures in malaria parasite WGS data. Further, as the approach is generalizable, it may be trained to detect other types of selection. With the ability to rapidly generate WGS data at low cost, machine learning approaches (e.g. DeepSweep) have the potential to assist parasite genome-based surveillance and inform malaria control decision-making. Article in Journal/Newspaper Arctic Directory of Open Access Journals: DOAJ Articles Arctic Malaria Journal 20 1
institution Open Polar
collection Directory of Open Access Journals: DOAJ Articles
op_collection_id ftdoajarticles
language English
topic Plasmodium falciparum
Plasmodium vivax
Population genomics
Drug resistance
Machine learning
Positive selection
Arctic medicine. Tropical medicine
RC955-962
Infectious and parasitic diseases
RC109-216
spellingShingle Plasmodium falciparum
Plasmodium vivax
Population genomics
Drug resistance
Machine learning
Positive selection
Arctic medicine. Tropical medicine
RC955-962
Infectious and parasitic diseases
RC109-216
Wouter Deelder
Ernest Diez Benavente
Jody Phelan
Emilia Manko
Susana Campino
Luigi Palla
Taane G. Clark
Using deep learning to identify recent positive selection in malaria parasite sequence data
topic_facet Plasmodium falciparum
Plasmodium vivax
Population genomics
Drug resistance
Machine learning
Positive selection
Arctic medicine. Tropical medicine
RC955-962
Infectious and parasitic diseases
RC109-216
description Abstract Background Malaria, caused by Plasmodium parasites, is a major global public health problem. To assist an understanding of malaria pathogenesis, including drug resistance, there is a need for the timely detection of underlying genetic mutations and their spread. With the increasing use of whole-genome sequencing (WGS) of Plasmodium DNA, the potential of deep learning models to detect loci under recent positive selection, historically signals of drug resistance, was evaluated. Methods A deep learning-based approach (called “DeepSweep”) was developed, which can be trained on haplotypic images from genetic regions with known sweeps, to identify loci under positive selection. DeepSweep software is available from https://github.com/WDee/Deepsweep . Results Using simulated genomic data, DeepSweep could detect recent sweeps with high predictive accuracy (areas under ROC curve > 0.95). DeepSweep was applied to Plasmodium falciparum (n = 1125; genome size 23 Mbp) and Plasmodium vivax (n = 368; genome size 29 Mbp) WGS data, and the genes identified overlapped with two established extended haplotype homozygosity methods (within-population iHS, across-population Rsb) (~ 60–75% overlap of hits at P < 0.0001). DeepSweep hits included regions proximal to known drug resistance loci for both P. falciparum (e.g. pfcrt, pfdhps and pfmdr1) and P. vivax (e.g. pvmrp1). Conclusion The deep learning approach can detect positive selection signatures in malaria parasite WGS data. Further, as the approach is generalizable, it may be trained to detect other types of selection. With the ability to rapidly generate WGS data at low cost, machine learning approaches (e.g. DeepSweep) have the potential to assist parasite genome-based surveillance and inform malaria control decision-making.
format Article in Journal/Newspaper
author Wouter Deelder
Ernest Diez Benavente
Jody Phelan
Emilia Manko
Susana Campino
Luigi Palla
Taane G. Clark
author_facet Wouter Deelder
Ernest Diez Benavente
Jody Phelan
Emilia Manko
Susana Campino
Luigi Palla
Taane G. Clark
author_sort Wouter Deelder
title Using deep learning to identify recent positive selection in malaria parasite sequence data
title_short Using deep learning to identify recent positive selection in malaria parasite sequence data
title_full Using deep learning to identify recent positive selection in malaria parasite sequence data
title_fullStr Using deep learning to identify recent positive selection in malaria parasite sequence data
title_full_unstemmed Using deep learning to identify recent positive selection in malaria parasite sequence data
title_sort using deep learning to identify recent positive selection in malaria parasite sequence data
publisher BMC
publishDate 2021
url https://doi.org/10.1186/s12936-021-03788-x
https://doaj.org/article/932d1e5320274ee9b76ff0a36efc77a6
geographic Arctic
geographic_facet Arctic
genre Arctic
genre_facet Arctic
op_source Malaria Journal, Vol 20, Iss 1, Pp 1-9 (2021)
op_relation https://doi.org/10.1186/s12936-021-03788-x
https://doaj.org/toc/1475-2875
doi:10.1186/s12936-021-03788-x
1475-2875
https://doaj.org/article/932d1e5320274ee9b76ff0a36efc77a6
op_doi https://doi.org/10.1186/s12936-021-03788-x
container_title Malaria Journal
container_volume 20
container_issue 1
_version_ 1766343196495839232