Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers

Currently available genetic tools effectively distinguish between different continental origins. However, North Eurasia, which constitutes one-third of the world’s largest continent, remains severely underrepresented. The dataset used in this study represents 266 populations from 12 North Eurasian c...

Full description

Bibliographic Details
Published in:Frontiers in Genetics
Main Authors: Igor Gorin, Oleg Balanovsky, Oleg Kozlov, Sergey Koshel, Elena Kostryukova, Maxat Zhabagin, Anastasiya Agdzhoyan, Vladimir Pylev, Elena Balanovska
Format: Article in Journal/Newspaper
Language:English
Published: Frontiers Media S.A. 2022
Subjects:
Online Access:https://doi.org/10.3389/fgene.2022.902309
https://doaj.org/article/7e5bc3c275cb4b68aee7c38fb56065de
id ftdoajarticles:oai:doaj.org/article:7e5bc3c275cb4b68aee7c38fb56065de
record_format openpolar
spelling ftdoajarticles:oai:doaj.org/article:7e5bc3c275cb4b68aee7c38fb56065de 2023-05-15T16:59:28+02:00 Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers Igor Gorin Oleg Balanovsky Oleg Kozlov Sergey Koshel Elena Kostryukova Maxat Zhabagin Anastasiya Agdzhoyan Vladimir Pylev Elena Balanovska 2022-05-01T00:00:00Z https://doi.org/10.3389/fgene.2022.902309 https://doaj.org/article/7e5bc3c275cb4b68aee7c38fb56065de EN eng Frontiers Media S.A. https://www.frontiersin.org/articles/10.3389/fgene.2022.902309/full https://doaj.org/toc/1664-8021 1664-8021 doi:10.3389/fgene.2022.902309 https://doaj.org/article/7e5bc3c275cb4b68aee7c38fb56065de Frontiers in Genetics, Vol 13 (2022) gene geography ancestry prediction human population genetics ancestral origin machine learning Genetics QH426-470 article 2022 ftdoajarticles https://doi.org/10.3389/fgene.2022.902309 2022-12-31T02:33:57Z Currently available genetic tools effectively distinguish between different continental origins. However, North Eurasia, which constitutes one-third of the world’s largest continent, remains severely underrepresented. The dataset used in this study represents 266 populations from 12 North Eurasian countries, including most of the ethnic diversity across Russia’s vast territory. A total of 1,883 samples were genotyped using the Illumina Infinium Omni5Exome-4 v1.3 BeadChip. Three principal components were computed for the entire dataset using three iterations for outlier removal. It allowed the merging of 266 populations into larger groups while maintaining intragroup homogeneity, so 29 ethnic geographic groups were formed that were genetically distinguishable enough to trace individual ancestry. Several feature selection methods, including the random forest algorithm, were tested to estimate the number of genetic markers needed to differentiate between the groups; 5,229 ancestry-informative SNPs were selected. We tested various classifiers supporting multiple classes and output values for each class that could be interpreted as probabilities. The logistic regression was chosen as the best mathematical model for predicting ancestral populations. The machine learning algorithm for inferring an ancestral ethnic geographic group was implemented in the original software “Homeland” fitted with the interface module, the prediction module, and the cartographic module. Examples of geographic maps showing the likelihood of geographic ancestry for individuals from different regions of North Eurasia are provided. Validating methods show that the highest number of ethnic geographic group predictions with almost absolute accuracy and sensitivity was observed for South and Central Siberia, Far East, and Kamchatka. The total accuracy of prediction of one of 29 ethnic geographic groups reached 71%. The proposed method can be employed to predict ancestries from the populations of Russia and its neighbor states. It can be used for ... Article in Journal/Newspaper Kamchatka Siberia Directory of Open Access Journals: DOAJ Articles Frontiers in Genetics 13
institution Open Polar
collection Directory of Open Access Journals: DOAJ Articles
op_collection_id ftdoajarticles
language English
topic gene geography
ancestry prediction
human population genetics
ancestral origin
machine learning
Genetics
QH426-470
spellingShingle gene geography
ancestry prediction
human population genetics
ancestral origin
machine learning
Genetics
QH426-470
Igor Gorin
Oleg Balanovsky
Oleg Kozlov
Sergey Koshel
Elena Kostryukova
Maxat Zhabagin
Anastasiya Agdzhoyan
Vladimir Pylev
Elena Balanovska
Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers
topic_facet gene geography
ancestry prediction
human population genetics
ancestral origin
machine learning
Genetics
QH426-470
description Currently available genetic tools effectively distinguish between different continental origins. However, North Eurasia, which constitutes one-third of the world’s largest continent, remains severely underrepresented. The dataset used in this study represents 266 populations from 12 North Eurasian countries, including most of the ethnic diversity across Russia’s vast territory. A total of 1,883 samples were genotyped using the Illumina Infinium Omni5Exome-4 v1.3 BeadChip. Three principal components were computed for the entire dataset using three iterations for outlier removal. It allowed the merging of 266 populations into larger groups while maintaining intragroup homogeneity, so 29 ethnic geographic groups were formed that were genetically distinguishable enough to trace individual ancestry. Several feature selection methods, including the random forest algorithm, were tested to estimate the number of genetic markers needed to differentiate between the groups; 5,229 ancestry-informative SNPs were selected. We tested various classifiers supporting multiple classes and output values for each class that could be interpreted as probabilities. The logistic regression was chosen as the best mathematical model for predicting ancestral populations. The machine learning algorithm for inferring an ancestral ethnic geographic group was implemented in the original software “Homeland” fitted with the interface module, the prediction module, and the cartographic module. Examples of geographic maps showing the likelihood of geographic ancestry for individuals from different regions of North Eurasia are provided. Validating methods show that the highest number of ethnic geographic group predictions with almost absolute accuracy and sensitivity was observed for South and Central Siberia, Far East, and Kamchatka. The total accuracy of prediction of one of 29 ethnic geographic groups reached 71%. The proposed method can be employed to predict ancestries from the populations of Russia and its neighbor states. It can be used for ...
format Article in Journal/Newspaper
author Igor Gorin
Oleg Balanovsky
Oleg Kozlov
Sergey Koshel
Elena Kostryukova
Maxat Zhabagin
Anastasiya Agdzhoyan
Vladimir Pylev
Elena Balanovska
author_facet Igor Gorin
Oleg Balanovsky
Oleg Kozlov
Sergey Koshel
Elena Kostryukova
Maxat Zhabagin
Anastasiya Agdzhoyan
Vladimir Pylev
Elena Balanovska
author_sort Igor Gorin
title Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers
title_short Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers
title_full Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers
title_fullStr Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers
title_full_unstemmed Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers
title_sort determining the area of ancestral origin for individuals from north eurasia based on 5,229 snp markers
publisher Frontiers Media S.A.
publishDate 2022
url https://doi.org/10.3389/fgene.2022.902309
https://doaj.org/article/7e5bc3c275cb4b68aee7c38fb56065de
genre Kamchatka
Siberia
genre_facet Kamchatka
Siberia
op_source Frontiers in Genetics, Vol 13 (2022)
op_relation https://www.frontiersin.org/articles/10.3389/fgene.2022.902309/full
https://doaj.org/toc/1664-8021
1664-8021
doi:10.3389/fgene.2022.902309
https://doaj.org/article/7e5bc3c275cb4b68aee7c38fb56065de
op_doi https://doi.org/10.3389/fgene.2022.902309
container_title Frontiers in Genetics
container_volume 13
_version_ 1766051739120697344