Polygenic risk scores using the lasso in a multi-cohort setting with application for lipoprotein(a)

Polygenic risk scores (PRS) are an important tool for better understanding the complex role that the human genome plays in different diseases and traits. PRS is an estimate of the total genetic risk of a phenotype for a subject based on its genetic makeup. Here we construct PRS in a multi-cohort set...

Full description

Bibliographic Details
Main Author: Steinþór Árdal 1995-
Other Authors: Háskóli Íslands
Format: Thesis
Language:English
Published: 2023
Subjects:
Online Access:http://hdl.handle.net/1946/43337
id ftskemman:oai:skemman.is:1946/43337
record_format openpolar
spelling ftskemman:oai:skemman.is:1946/43337 2023-05-15T16:49:40+02:00 Polygenic risk scores using the lasso in a multi-cohort setting with application for lipoprotein(a) Steinþór Árdal 1995- Háskóli Íslands 2023-01 application/pdf http://hdl.handle.net/1946/43337 en eng http://hdl.handle.net/1946/43337 Tölfræði Thesis Master's 2023 ftskemman 2023-02-01T23:50:54Z Polygenic risk scores (PRS) are an important tool for better understanding the complex role that the human genome plays in different diseases and traits. PRS is an estimate of the total genetic risk of a phenotype for a subject based on its genetic makeup. Here we construct PRS in a multi-cohort setting using the lasso to select and estimate the effect sizes of sequence variants. The UK Biobank (UKB) and the Icelandic cohort are used for model development. The BASIL algorithm, which eliminates features from the lasso before fitting the model, was also tested for quantitative phenotypes in UKB and Iceland. Lipoprotein(a) (Lp(a)) is an independent causal risk factor for cardiovascular diseases. The size and levels of Lp(a) are primarily determined by genetics, while environmental factors have minimal effects. The main phenotype of focus here is Lp(a), where we use UKB and Iceland to construct the models, and predict Lp(a) for the third independent Danish cohort that does not have measurements of Lp(a). We trained four different models for Lp(a) based on the training and validation sets. Models that used both cohorts (multi-cohort models) had higher performance on the test sets (R^2 = 0.69), compared to the single-cohort models (R^2 = 0.63), which tended to overfit. Furthermore, the predictions from the multi-cohort models have a stronger association with cardiovascular phenotypes in the Danish cohort compared to the single-cohort models, suggesting that the multi-cohort models generalize better for these populations. We also compared the lasso with LDpred for other quantitative phenotypes, resulting in similar, or significantly better, performance. Fjölgena áhættumat (PRS) er mikilvægt tól til þess að skilja betur áhrif genamengisins á mismunandi svipgerðir. PRS er mat á heildaráhættu einstaklings fyrir tiltekna svipgerð sem byggt er á öllum erfðabreytileikum sem taldir eru hafa áhrif á hana. Í þessu verkefni bjuggum við til PRS með því að nota gögn úr þremur þýðum, þar sem lasso var notað til þess að velja ... Thesis Iceland Skemman (Iceland)
institution Open Polar
collection Skemman (Iceland)
op_collection_id ftskemman
language English
topic Tölfræði
spellingShingle Tölfræði
Steinþór Árdal 1995-
Polygenic risk scores using the lasso in a multi-cohort setting with application for lipoprotein(a)
topic_facet Tölfræði
description Polygenic risk scores (PRS) are an important tool for better understanding the complex role that the human genome plays in different diseases and traits. PRS is an estimate of the total genetic risk of a phenotype for a subject based on its genetic makeup. Here we construct PRS in a multi-cohort setting using the lasso to select and estimate the effect sizes of sequence variants. The UK Biobank (UKB) and the Icelandic cohort are used for model development. The BASIL algorithm, which eliminates features from the lasso before fitting the model, was also tested for quantitative phenotypes in UKB and Iceland. Lipoprotein(a) (Lp(a)) is an independent causal risk factor for cardiovascular diseases. The size and levels of Lp(a) are primarily determined by genetics, while environmental factors have minimal effects. The main phenotype of focus here is Lp(a), where we use UKB and Iceland to construct the models, and predict Lp(a) for the third independent Danish cohort that does not have measurements of Lp(a). We trained four different models for Lp(a) based on the training and validation sets. Models that used both cohorts (multi-cohort models) had higher performance on the test sets (R^2 = 0.69), compared to the single-cohort models (R^2 = 0.63), which tended to overfit. Furthermore, the predictions from the multi-cohort models have a stronger association with cardiovascular phenotypes in the Danish cohort compared to the single-cohort models, suggesting that the multi-cohort models generalize better for these populations. We also compared the lasso with LDpred for other quantitative phenotypes, resulting in similar, or significantly better, performance. Fjölgena áhættumat (PRS) er mikilvægt tól til þess að skilja betur áhrif genamengisins á mismunandi svipgerðir. PRS er mat á heildaráhættu einstaklings fyrir tiltekna svipgerð sem byggt er á öllum erfðabreytileikum sem taldir eru hafa áhrif á hana. Í þessu verkefni bjuggum við til PRS með því að nota gögn úr þremur þýðum, þar sem lasso var notað til þess að velja ...
author2 Háskóli Íslands
format Thesis
author Steinþór Árdal 1995-
author_facet Steinþór Árdal 1995-
author_sort Steinþór Árdal 1995-
title Polygenic risk scores using the lasso in a multi-cohort setting with application for lipoprotein(a)
title_short Polygenic risk scores using the lasso in a multi-cohort setting with application for lipoprotein(a)
title_full Polygenic risk scores using the lasso in a multi-cohort setting with application for lipoprotein(a)
title_fullStr Polygenic risk scores using the lasso in a multi-cohort setting with application for lipoprotein(a)
title_full_unstemmed Polygenic risk scores using the lasso in a multi-cohort setting with application for lipoprotein(a)
title_sort polygenic risk scores using the lasso in a multi-cohort setting with application for lipoprotein(a)
publishDate 2023
url http://hdl.handle.net/1946/43337
genre Iceland
genre_facet Iceland
op_relation http://hdl.handle.net/1946/43337
_version_ 1766039850759225344