Novel probabilistic models of spatial genetic ancestry with applications to stratification correction in genome-wide association studies

Abstract Motivation Genetic variation in human populations is influenced by geographic ancestry due to spatial locality in historical mating and migration patterns. Spatial population structure in genetic datasets has been traditionally analyzed using either model-free algorithms, such as principal...

Full description

Bibliographic Details
Published in:Bioinformatics
Main Authors: Bhaskar, Anand, Javanmard, Adel, Courtade, Thomas A, Tse, David
Other Authors: Valencia, Alfonso, CSoI fellowship during the course of this work, NIH
Format: Article in Journal/Newspaper
Language:English
Published: Oxford University Press (OUP) 2016
Subjects:
Online Access:http://dx.doi.org/10.1093/bioinformatics/btw720
https://academic.oup.com/bioinformatics/article-pdf/33/6/879/49038209/bioinformatics_33_6_879.pdf
id croxfordunivpr:10.1093/bioinformatics/btw720
record_format openpolar
spelling croxfordunivpr:10.1093/bioinformatics/btw720 2024-09-15T18:25:42+00:00 Novel probabilistic models of spatial genetic ancestry with applications to stratification correction in genome-wide association studies Bhaskar, Anand Javanmard, Adel Courtade, Thomas A Tse, David Valencia, Alfonso CSoI fellowship during the course of this work NIH 2016 http://dx.doi.org/10.1093/bioinformatics/btw720 https://academic.oup.com/bioinformatics/article-pdf/33/6/879/49038209/bioinformatics_33_6_879.pdf en eng Oxford University Press (OUP) https://academic.oup.com/journals/pages/about_us/legal/notices Bioinformatics volume 33, issue 6, page 879-885 ISSN 1367-4803 1367-4811 journal-article 2016 croxfordunivpr https://doi.org/10.1093/bioinformatics/btw720 2024-08-05T04:33:56Z Abstract Motivation Genetic variation in human populations is influenced by geographic ancestry due to spatial locality in historical mating and migration patterns. Spatial population structure in genetic datasets has been traditionally analyzed using either model-free algorithms, such as principal components analysis (PCA) and multidimensional scaling, or using explicit spatial probabilistic models of allele frequency evolution. We develop a general probabilistic model and an associated inference algorithm that unify the model-based and data-driven approaches to visualizing and inferring population structure. Our spatial inference algorithm can also be effectively applied to the problem of population stratification in genome-wide association studies (GWAS), where hidden population structure can create fictitious associations when population ancestry is correlated with both the genotype and the trait. Results Our algorithm Geographic Ancestry Positioning (GAP) relates local genetic distances between samples to their spatial distances, and can be used for visually discerning population structure as well as accurately inferring the spatial origin of individuals on a two-dimensional continuum. On both simulated and several real datasets from diverse human populations, GAP exhibits substantially lower error in reconstructing spatial ancestry coordinates compared to PCA. We also develop an association test that uses the ancestry coordinates inferred by GAP to accurately account for ancestry-induced correlations in GWAS. Based on simulations and analysis of a dataset of 10 metabolic traits measured in a Northern Finland cohort, which is known to exhibit significant population structure, we find that our method has superior power to current approaches. Availability and Implementation Our software is available at https://github.com/anand-bhaskar/gap. Supplementary information Supplementary data are available at Bioinformatics online. Article in Journal/Newspaper Northern Finland Oxford University Press Bioinformatics 33 6 879 885
institution Open Polar
collection Oxford University Press
op_collection_id croxfordunivpr
language English
description Abstract Motivation Genetic variation in human populations is influenced by geographic ancestry due to spatial locality in historical mating and migration patterns. Spatial population structure in genetic datasets has been traditionally analyzed using either model-free algorithms, such as principal components analysis (PCA) and multidimensional scaling, or using explicit spatial probabilistic models of allele frequency evolution. We develop a general probabilistic model and an associated inference algorithm that unify the model-based and data-driven approaches to visualizing and inferring population structure. Our spatial inference algorithm can also be effectively applied to the problem of population stratification in genome-wide association studies (GWAS), where hidden population structure can create fictitious associations when population ancestry is correlated with both the genotype and the trait. Results Our algorithm Geographic Ancestry Positioning (GAP) relates local genetic distances between samples to their spatial distances, and can be used for visually discerning population structure as well as accurately inferring the spatial origin of individuals on a two-dimensional continuum. On both simulated and several real datasets from diverse human populations, GAP exhibits substantially lower error in reconstructing spatial ancestry coordinates compared to PCA. We also develop an association test that uses the ancestry coordinates inferred by GAP to accurately account for ancestry-induced correlations in GWAS. Based on simulations and analysis of a dataset of 10 metabolic traits measured in a Northern Finland cohort, which is known to exhibit significant population structure, we find that our method has superior power to current approaches. Availability and Implementation Our software is available at https://github.com/anand-bhaskar/gap. Supplementary information Supplementary data are available at Bioinformatics online.
author2 Valencia, Alfonso
CSoI fellowship during the course of this work
NIH
format Article in Journal/Newspaper
author Bhaskar, Anand
Javanmard, Adel
Courtade, Thomas A
Tse, David
spellingShingle Bhaskar, Anand
Javanmard, Adel
Courtade, Thomas A
Tse, David
Novel probabilistic models of spatial genetic ancestry with applications to stratification correction in genome-wide association studies
author_facet Bhaskar, Anand
Javanmard, Adel
Courtade, Thomas A
Tse, David
author_sort Bhaskar, Anand
title Novel probabilistic models of spatial genetic ancestry with applications to stratification correction in genome-wide association studies
title_short Novel probabilistic models of spatial genetic ancestry with applications to stratification correction in genome-wide association studies
title_full Novel probabilistic models of spatial genetic ancestry with applications to stratification correction in genome-wide association studies
title_fullStr Novel probabilistic models of spatial genetic ancestry with applications to stratification correction in genome-wide association studies
title_full_unstemmed Novel probabilistic models of spatial genetic ancestry with applications to stratification correction in genome-wide association studies
title_sort novel probabilistic models of spatial genetic ancestry with applications to stratification correction in genome-wide association studies
publisher Oxford University Press (OUP)
publishDate 2016
url http://dx.doi.org/10.1093/bioinformatics/btw720
https://academic.oup.com/bioinformatics/article-pdf/33/6/879/49038209/bioinformatics_33_6_879.pdf
genre Northern Finland
genre_facet Northern Finland
op_source Bioinformatics
volume 33, issue 6, page 879-885
ISSN 1367-4803 1367-4811
op_rights https://academic.oup.com/journals/pages/about_us/legal/notices
op_doi https://doi.org/10.1093/bioinformatics/btw720
container_title Bioinformatics
container_volume 33
container_issue 6
container_start_page 879
op_container_end_page 885
_version_ 1810466193522294784