Two-phase Design for Regional Genetic Sequencing using Polygenic Risk Scores

Due to the high cost of DNA sequencing for large-scale data, I propose a two-phase design using polygenic risk scores (PRS) to inform selection of individuals in phase 1, followed by regional sequencing in a selected subsample in phase 2. Residual dependent sampling (RDS) design is implemented by re...

Full description

Bibliographic Details
Main Author: Wang, Guan
Other Authors: Bull, Shelley B, Espin-Garcia, Osvaldo, Dalla Lana School of Public Health
Format: Thesis
Language:unknown
Published: University of Toronto 2022
Subjects:
Online Access:http://hdl.handle.net/1807/110846
Description
Summary:Due to the high cost of DNA sequencing for large-scale data, I propose a two-phase design using polygenic risk scores (PRS) to inform selection of individuals in phase 1, followed by regional sequencing in a selected subsample in phase 2. Residual dependent sampling (RDS) design is implemented by regressing the phenotype of interest on the PRS and selecting individuals with extreme residuals as the phase 2 subsample. Efficient analysis can be carried out under semi-parametric modelling by the EM algorithm. A fine-mapping application in a genome-wide association study (GWAS) of triglyceride levels in 4, 504 individuals from the Northern Finland Birth Cohort of 1966 shows the proposed method can reduce sequencing costs in post-GWAS analyses while maintaining statistical performance. Simulation studies show that the proposed RDS design gives more precise estimation than simple random sampling, with adequate type one error control, while performing more similarly to the complete sample. M.Sc.