Polygenic risk scores using the lasso in a multi-cohort setting with application for lipoprotein(a)

Polygenic risk scores (PRS) are an important tool for better understanding the complex role that the human genome plays in different diseases and traits. PRS is an estimate of the total genetic risk of a phenotype for a subject based on its genetic makeup. Here we construct PRS in a multi-cohort set...

Full description

Bibliographic Details
Main Author: Steinþór Árdal 1995-
Other Authors: Háskóli Íslands
Format: Thesis
Language:English
Published: 2023
Subjects:
Online Access:http://hdl.handle.net/1946/43337
Description
Summary:Polygenic risk scores (PRS) are an important tool for better understanding the complex role that the human genome plays in different diseases and traits. PRS is an estimate of the total genetic risk of a phenotype for a subject based on its genetic makeup. Here we construct PRS in a multi-cohort setting using the lasso to select and estimate the effect sizes of sequence variants. The UK Biobank (UKB) and the Icelandic cohort are used for model development. The BASIL algorithm, which eliminates features from the lasso before fitting the model, was also tested for quantitative phenotypes in UKB and Iceland. Lipoprotein(a) (Lp(a)) is an independent causal risk factor for cardiovascular diseases. The size and levels of Lp(a) are primarily determined by genetics, while environmental factors have minimal effects. The main phenotype of focus here is Lp(a), where we use UKB and Iceland to construct the models, and predict Lp(a) for the third independent Danish cohort that does not have measurements of Lp(a). We trained four different models for Lp(a) based on the training and validation sets. Models that used both cohorts (multi-cohort models) had higher performance on the test sets (R^2 = 0.69), compared to the single-cohort models (R^2 = 0.63), which tended to overfit. Furthermore, the predictions from the multi-cohort models have a stronger association with cardiovascular phenotypes in the Danish cohort compared to the single-cohort models, suggesting that the multi-cohort models generalize better for these populations. We also compared the lasso with LDpred for other quantitative phenotypes, resulting in similar, or significantly better, performance. Fjölgena áhættumat (PRS) er mikilvægt tól til þess að skilja betur áhrif genamengisins á mismunandi svipgerðir. PRS er mat á heildaráhættu einstaklings fyrir tiltekna svipgerð sem byggt er á öllum erfðabreytileikum sem taldir eru hafa áhrif á hana. Í þessu verkefni bjuggum við til PRS með því að nota gögn úr þremur þýðum, þar sem lasso var notað til þess að velja ...