Image4_Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers.JPEG

Currently available genetic tools effectively distinguish between different continental origins. However, North Eurasia, which constitutes one-third of the world’s largest continent, remains severely underrepresented. The dataset used in this study represents 266 populations from 12 North Eurasian c...

Full description

Bibliographic Details
Main Authors: Igor Gorin, Oleg Balanovsky, Oleg Kozlov, Sergey Koshel, Elena Kostryukova, Maxat Zhabagin, Anastasiya Agdzhoyan, Vladimir Pylev, Elena Balanovska
Format: Still Image
Language:unknown
Published: 2022
Subjects:
Online Access:https://doi.org/10.3389/fgene.2022.902309.s004
https://figshare.com/articles/figure/Image4_Determining_the_Area_of_Ancestral_Origin_for_Individuals_From_North_Eurasia_Based_on_5_229_SNP_Markers_JPEG/19770211
id ftfrontimediafig:oai:figshare.com:article/19770211
record_format openpolar
spelling ftfrontimediafig:oai:figshare.com:article/19770211 2023-05-15T16:59:29+02:00 Image4_Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers.JPEG Igor Gorin Oleg Balanovsky Oleg Kozlov Sergey Koshel Elena Kostryukova Maxat Zhabagin Anastasiya Agdzhoyan Vladimir Pylev Elena Balanovska 2022-05-16T04:46:37Z https://doi.org/10.3389/fgene.2022.902309.s004 https://figshare.com/articles/figure/Image4_Determining_the_Area_of_Ancestral_Origin_for_Individuals_From_North_Eurasia_Based_on_5_229_SNP_Markers_JPEG/19770211 unknown doi:10.3389/fgene.2022.902309.s004 https://figshare.com/articles/figure/Image4_Determining_the_Area_of_Ancestral_Origin_for_Individuals_From_North_Eurasia_Based_on_5_229_SNP_Markers_JPEG/19770211 CC BY 4.0 CC-BY Genetics Genetic Engineering Biomarkers Developmental Genetics (incl. Sex Determination) Epigenetics (incl. Genome Methylation and Epigenomics) Gene Expression (incl. Microarray and other genome-wide approaches) Genome Structure and Regulation Genomics Genetically Modified Animals Livestock Cloning Gene and Molecular Therapy gene geography ancestry prediction human population genetics ancestral origin machine learning Image Figure 2022 ftfrontimediafig https://doi.org/10.3389/fgene.2022.902309.s004 2022-05-18T23:10:38Z Currently available genetic tools effectively distinguish between different continental origins. However, North Eurasia, which constitutes one-third of the world’s largest continent, remains severely underrepresented. The dataset used in this study represents 266 populations from 12 North Eurasian countries, including most of the ethnic diversity across Russia’s vast territory. A total of 1,883 samples were genotyped using the Illumina Infinium Omni5Exome-4 v1.3 BeadChip. Three principal components were computed for the entire dataset using three iterations for outlier removal. It allowed the merging of 266 populations into larger groups while maintaining intragroup homogeneity, so 29 ethnic geographic groups were formed that were genetically distinguishable enough to trace individual ancestry. Several feature selection methods, including the random forest algorithm, were tested to estimate the number of genetic markers needed to differentiate between the groups; 5,229 ancestry-informative SNPs were selected. We tested various classifiers supporting multiple classes and output values for each class that could be interpreted as probabilities. The logistic regression was chosen as the best mathematical model for predicting ancestral populations. The machine learning algorithm for inferring an ancestral ethnic geographic group was implemented in the original software “Homeland” fitted with the interface module, the prediction module, and the cartographic module. Examples of geographic maps showing the likelihood of geographic ancestry for individuals from different regions of North Eurasia are provided. Validating methods show that the highest number of ethnic geographic group predictions with almost absolute accuracy and sensitivity was observed for South and Central Siberia, Far East, and Kamchatka. The total accuracy of prediction of one of 29 ethnic geographic groups reached 71%. The proposed method can be employed to predict ancestries from the populations of Russia and its neighbor states. It can be used for ... Still Image Kamchatka Siberia Frontiers: Figshare
institution Open Polar
collection Frontiers: Figshare
op_collection_id ftfrontimediafig
language unknown
topic Genetics
Genetic Engineering
Biomarkers
Developmental Genetics (incl. Sex Determination)
Epigenetics (incl. Genome Methylation and Epigenomics)
Gene Expression (incl. Microarray and other genome-wide approaches)
Genome Structure and Regulation
Genomics
Genetically Modified Animals
Livestock Cloning
Gene and Molecular Therapy
gene geography
ancestry prediction
human population genetics
ancestral origin
machine learning
spellingShingle Genetics
Genetic Engineering
Biomarkers
Developmental Genetics (incl. Sex Determination)
Epigenetics (incl. Genome Methylation and Epigenomics)
Gene Expression (incl. Microarray and other genome-wide approaches)
Genome Structure and Regulation
Genomics
Genetically Modified Animals
Livestock Cloning
Gene and Molecular Therapy
gene geography
ancestry prediction
human population genetics
ancestral origin
machine learning
Igor Gorin
Oleg Balanovsky
Oleg Kozlov
Sergey Koshel
Elena Kostryukova
Maxat Zhabagin
Anastasiya Agdzhoyan
Vladimir Pylev
Elena Balanovska
Image4_Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers.JPEG
topic_facet Genetics
Genetic Engineering
Biomarkers
Developmental Genetics (incl. Sex Determination)
Epigenetics (incl. Genome Methylation and Epigenomics)
Gene Expression (incl. Microarray and other genome-wide approaches)
Genome Structure and Regulation
Genomics
Genetically Modified Animals
Livestock Cloning
Gene and Molecular Therapy
gene geography
ancestry prediction
human population genetics
ancestral origin
machine learning
description Currently available genetic tools effectively distinguish between different continental origins. However, North Eurasia, which constitutes one-third of the world’s largest continent, remains severely underrepresented. The dataset used in this study represents 266 populations from 12 North Eurasian countries, including most of the ethnic diversity across Russia’s vast territory. A total of 1,883 samples were genotyped using the Illumina Infinium Omni5Exome-4 v1.3 BeadChip. Three principal components were computed for the entire dataset using three iterations for outlier removal. It allowed the merging of 266 populations into larger groups while maintaining intragroup homogeneity, so 29 ethnic geographic groups were formed that were genetically distinguishable enough to trace individual ancestry. Several feature selection methods, including the random forest algorithm, were tested to estimate the number of genetic markers needed to differentiate between the groups; 5,229 ancestry-informative SNPs were selected. We tested various classifiers supporting multiple classes and output values for each class that could be interpreted as probabilities. The logistic regression was chosen as the best mathematical model for predicting ancestral populations. The machine learning algorithm for inferring an ancestral ethnic geographic group was implemented in the original software “Homeland” fitted with the interface module, the prediction module, and the cartographic module. Examples of geographic maps showing the likelihood of geographic ancestry for individuals from different regions of North Eurasia are provided. Validating methods show that the highest number of ethnic geographic group predictions with almost absolute accuracy and sensitivity was observed for South and Central Siberia, Far East, and Kamchatka. The total accuracy of prediction of one of 29 ethnic geographic groups reached 71%. The proposed method can be employed to predict ancestries from the populations of Russia and its neighbor states. It can be used for ...
format Still Image
author Igor Gorin
Oleg Balanovsky
Oleg Kozlov
Sergey Koshel
Elena Kostryukova
Maxat Zhabagin
Anastasiya Agdzhoyan
Vladimir Pylev
Elena Balanovska
author_facet Igor Gorin
Oleg Balanovsky
Oleg Kozlov
Sergey Koshel
Elena Kostryukova
Maxat Zhabagin
Anastasiya Agdzhoyan
Vladimir Pylev
Elena Balanovska
author_sort Igor Gorin
title Image4_Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers.JPEG
title_short Image4_Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers.JPEG
title_full Image4_Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers.JPEG
title_fullStr Image4_Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers.JPEG
title_full_unstemmed Image4_Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers.JPEG
title_sort image4_determining the area of ancestral origin for individuals from north eurasia based on 5,229 snp markers.jpeg
publishDate 2022
url https://doi.org/10.3389/fgene.2022.902309.s004
https://figshare.com/articles/figure/Image4_Determining_the_Area_of_Ancestral_Origin_for_Individuals_From_North_Eurasia_Based_on_5_229_SNP_Markers_JPEG/19770211
genre Kamchatka
Siberia
genre_facet Kamchatka
Siberia
op_relation doi:10.3389/fgene.2022.902309.s004
https://figshare.com/articles/figure/Image4_Determining_the_Area_of_Ancestral_Origin_for_Individuals_From_North_Eurasia_Based_on_5_229_SNP_Markers_JPEG/19770211
op_rights CC BY 4.0
op_rightsnorm CC-BY
op_doi https://doi.org/10.3389/fgene.2022.902309.s004
_version_ 1766051758753185792