Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers
Currently available genetic tools effectively distinguish between different continental origins. However, North Eurasia, which constitutes one-third of the world’s largest continent, remains severely underrepresented. The dataset used in this study represents 266 populations from 12 North Eurasian c...
Published in: | Frontiers in Genetics |
---|---|
Main Authors: | , , , , , , , , |
Format: | Article in Journal/Newspaper |
Language: | English |
Published: |
Frontiers Media S.A.
2022
|
Subjects: | |
Online Access: | https://doi.org/10.3389/fgene.2022.902309 https://doaj.org/article/7e5bc3c275cb4b68aee7c38fb56065de |
id |
ftdoajarticles:oai:doaj.org/article:7e5bc3c275cb4b68aee7c38fb56065de |
---|---|
record_format |
openpolar |
spelling |
ftdoajarticles:oai:doaj.org/article:7e5bc3c275cb4b68aee7c38fb56065de 2023-05-15T16:59:28+02:00 Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers Igor Gorin Oleg Balanovsky Oleg Kozlov Sergey Koshel Elena Kostryukova Maxat Zhabagin Anastasiya Agdzhoyan Vladimir Pylev Elena Balanovska 2022-05-01T00:00:00Z https://doi.org/10.3389/fgene.2022.902309 https://doaj.org/article/7e5bc3c275cb4b68aee7c38fb56065de EN eng Frontiers Media S.A. https://www.frontiersin.org/articles/10.3389/fgene.2022.902309/full https://doaj.org/toc/1664-8021 1664-8021 doi:10.3389/fgene.2022.902309 https://doaj.org/article/7e5bc3c275cb4b68aee7c38fb56065de Frontiers in Genetics, Vol 13 (2022) gene geography ancestry prediction human population genetics ancestral origin machine learning Genetics QH426-470 article 2022 ftdoajarticles https://doi.org/10.3389/fgene.2022.902309 2022-12-31T02:33:57Z Currently available genetic tools effectively distinguish between different continental origins. However, North Eurasia, which constitutes one-third of the world’s largest continent, remains severely underrepresented. The dataset used in this study represents 266 populations from 12 North Eurasian countries, including most of the ethnic diversity across Russia’s vast territory. A total of 1,883 samples were genotyped using the Illumina Infinium Omni5Exome-4 v1.3 BeadChip. Three principal components were computed for the entire dataset using three iterations for outlier removal. It allowed the merging of 266 populations into larger groups while maintaining intragroup homogeneity, so 29 ethnic geographic groups were formed that were genetically distinguishable enough to trace individual ancestry. Several feature selection methods, including the random forest algorithm, were tested to estimate the number of genetic markers needed to differentiate between the groups; 5,229 ancestry-informative SNPs were selected. We tested various classifiers supporting multiple classes and output values for each class that could be interpreted as probabilities. The logistic regression was chosen as the best mathematical model for predicting ancestral populations. The machine learning algorithm for inferring an ancestral ethnic geographic group was implemented in the original software “Homeland” fitted with the interface module, the prediction module, and the cartographic module. Examples of geographic maps showing the likelihood of geographic ancestry for individuals from different regions of North Eurasia are provided. Validating methods show that the highest number of ethnic geographic group predictions with almost absolute accuracy and sensitivity was observed for South and Central Siberia, Far East, and Kamchatka. The total accuracy of prediction of one of 29 ethnic geographic groups reached 71%. The proposed method can be employed to predict ancestries from the populations of Russia and its neighbor states. It can be used for ... Article in Journal/Newspaper Kamchatka Siberia Directory of Open Access Journals: DOAJ Articles Frontiers in Genetics 13 |
institution |
Open Polar |
collection |
Directory of Open Access Journals: DOAJ Articles |
op_collection_id |
ftdoajarticles |
language |
English |
topic |
gene geography ancestry prediction human population genetics ancestral origin machine learning Genetics QH426-470 |
spellingShingle |
gene geography ancestry prediction human population genetics ancestral origin machine learning Genetics QH426-470 Igor Gorin Oleg Balanovsky Oleg Kozlov Sergey Koshel Elena Kostryukova Maxat Zhabagin Anastasiya Agdzhoyan Vladimir Pylev Elena Balanovska Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers |
topic_facet |
gene geography ancestry prediction human population genetics ancestral origin machine learning Genetics QH426-470 |
description |
Currently available genetic tools effectively distinguish between different continental origins. However, North Eurasia, which constitutes one-third of the world’s largest continent, remains severely underrepresented. The dataset used in this study represents 266 populations from 12 North Eurasian countries, including most of the ethnic diversity across Russia’s vast territory. A total of 1,883 samples were genotyped using the Illumina Infinium Omni5Exome-4 v1.3 BeadChip. Three principal components were computed for the entire dataset using three iterations for outlier removal. It allowed the merging of 266 populations into larger groups while maintaining intragroup homogeneity, so 29 ethnic geographic groups were formed that were genetically distinguishable enough to trace individual ancestry. Several feature selection methods, including the random forest algorithm, were tested to estimate the number of genetic markers needed to differentiate between the groups; 5,229 ancestry-informative SNPs were selected. We tested various classifiers supporting multiple classes and output values for each class that could be interpreted as probabilities. The logistic regression was chosen as the best mathematical model for predicting ancestral populations. The machine learning algorithm for inferring an ancestral ethnic geographic group was implemented in the original software “Homeland” fitted with the interface module, the prediction module, and the cartographic module. Examples of geographic maps showing the likelihood of geographic ancestry for individuals from different regions of North Eurasia are provided. Validating methods show that the highest number of ethnic geographic group predictions with almost absolute accuracy and sensitivity was observed for South and Central Siberia, Far East, and Kamchatka. The total accuracy of prediction of one of 29 ethnic geographic groups reached 71%. The proposed method can be employed to predict ancestries from the populations of Russia and its neighbor states. It can be used for ... |
format |
Article in Journal/Newspaper |
author |
Igor Gorin Oleg Balanovsky Oleg Kozlov Sergey Koshel Elena Kostryukova Maxat Zhabagin Anastasiya Agdzhoyan Vladimir Pylev Elena Balanovska |
author_facet |
Igor Gorin Oleg Balanovsky Oleg Kozlov Sergey Koshel Elena Kostryukova Maxat Zhabagin Anastasiya Agdzhoyan Vladimir Pylev Elena Balanovska |
author_sort |
Igor Gorin |
title |
Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers |
title_short |
Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers |
title_full |
Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers |
title_fullStr |
Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers |
title_full_unstemmed |
Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers |
title_sort |
determining the area of ancestral origin for individuals from north eurasia based on 5,229 snp markers |
publisher |
Frontiers Media S.A. |
publishDate |
2022 |
url |
https://doi.org/10.3389/fgene.2022.902309 https://doaj.org/article/7e5bc3c275cb4b68aee7c38fb56065de |
genre |
Kamchatka Siberia |
genre_facet |
Kamchatka Siberia |
op_source |
Frontiers in Genetics, Vol 13 (2022) |
op_relation |
https://www.frontiersin.org/articles/10.3389/fgene.2022.902309/full https://doaj.org/toc/1664-8021 1664-8021 doi:10.3389/fgene.2022.902309 https://doaj.org/article/7e5bc3c275cb4b68aee7c38fb56065de |
op_doi |
https://doi.org/10.3389/fgene.2022.902309 |
container_title |
Frontiers in Genetics |
container_volume |
13 |
_version_ |
1766051739120697344 |