Deciphering genomic variation and effective population size using NGS and SNP data in mammals

학위논문 (박사)-- 서울대학교 대학원 : 농생명공학부, 2014. 8. 김희발. This doctoral dissertation consists of five studies related to mammalian genetic variation and effective population size using SNP data or NGS data. Effective population size is essential to measure data size, quality and genetic diversity of animal popu...

Full description

Bibliographic Details
Main Author: 신동현
Other Authors: 김희발, Donghyun Shin, 농업생명과학대학 농생명공학부
Format: Thesis
Language:English
Published: 서울대학교 대학원 2014
Subjects:
630
Online Access:http://hdl.handle.net/10371/119472
Description
Summary:학위논문 (박사)-- 서울대학교 대학원 : 농생명공학부, 2014. 8. 김희발. This doctoral dissertation consists of five studies related to mammalian genetic variation and effective population size using SNP data or NGS data. Effective population size is essential to measure data size, quality and genetic diversity of animal population. I thus investigated economic trait-associated genetic variation of domesticated animal using SNP data. In addition, I examined copy number variation related to domestication process of cattle using NGS data. In chapter 1, I introduced the basic background and necessity of the series of worked in this doctoral dissertation. The effective population size (Ne) is important to assess the genetic diversity of animal populations. In chapter 2, I characterized more accurate linkage disequilibrium in a sample of 96 dairy cattle producing milk in Korea and estimated Ne that is approximately 122. And I inferred historical Ne and I can knew that a rapid increase Ne over the past 10 generations, and increased slowly thereafter. These results can be rationalized using current knowledge of the history of the dairy cattle breeds producing milk in Korea. In chapter 3, I investigated the common minke whale (Balaenoptera acutorostrata) genome using next generation sequencing. After then, I estimated historical effective population size in the minke whale based on coalescent model to know when minke whale population size decreases rapidly. As a result, I guessed that minke whale population diversity downsized to approximately 3.1%. And strong predicted time of minke whale declination during Holocene is approximately between 194 and 902 years ago. These whole-genome sequencing offers a chance to better understand the population history of the largest aquatic mammals on earth. After knowing population characteristic, I investigated genetic variant related to economic traits of domesticated animal. In chapter 4, I identified SNPs related to horse racing performance. Thoroughbred, a relatively recent horse breed, is best known for its use in horse racing. Although myostatin (MSTN) variants have been reported to be highly associated with horse racing performance, the trait is more likely to be polygenic in nature. I conducted a two-stage genome-wide association study to search for genetic variants associated with the EBV. I identified 28 significant SNPs related to 17 genes. Among these, six genes have a function related to myogenesis and five genes are involved in muscle maintenance. To my knowledge, these genes are newly reported for the genetic association with racing performance of Thoroughbreds. It complements a recent horse GWAS of racing performance that identified other SNPs and genes as the most significant variants. These results will help to expand my knowledge of the polygenic nature of racing performance in Thoroughbreds. In chapter 5, I identified SNPs related to milk production of dairy cattle. Holsteins are known as the world's highest-milk producing dairy cattle. I inferred each EBVs using recent ridge regression BLUP. After then, I conducted multivariate genome-wide association study to search for genetic variants associated with the EBVs for milk production traits using SNP data. I identified 128 significant SNPs related to 47 genes. These genes were related to cellular component localization, protein localization, intracellular signaling cascade and microtubule. These genes are newly reported for the genetic association with milk production of Holstein. It complements a recent Holstein GWAS that identified other SNPs and genes as the most significant variants. These results will help to expand my knowledge of the polygenic nature of milk production in Holstein. Finally, I detected cattle copy number variations related to domestication process, as another genetic source except SNP. Copy number variation (CNV), a source of genetic diversity in mammals, has been shown to underlie biological functions related to production traits. Notwithstanding, there have been few studies conducted on CNVs using next generation sequencing at the population level. I used NGS data containing ten Holsteins, a dairy cattle, and 22 Hanwoo, a beef cattle. The sequence data for each of the 32 animals varied from 13.58-fold to almost 20-fold coverage. I detected a total of 6,811 deleted CNVs across the analyzed individuals (average length = 2,732.2 bp) corresponding to 0.74% of the cattle genome (18.6 Mbp of variable sequence). By examining the overlap between CNV deletion regions and genes, I selected 30 genes with the highest deletion scores. These genes were found to be related to the nervous system, more specifically with nervous transmission, neuron motion, and neurogenesis. I regarded these genes as having been effected by the domestication process. Further analysis of the CNV genotyping information revealed 94 putative selected CNVs and 954 breed-specific CNVs. This study provides useful information for assessing the impact of CNVs on cattle traits using NGS data at the population level. Abstract i Contents iv List of Tables vii List of Figures ix Abbreviation xv General Introduction 1 Chapter 1. Literature Review 8 1.1 Effective Population Size 9 1.2 Genome-wide Association Study 19 1.3 Copy Number Variation Using Next Generation Sequencing 27 Chapter 2. Accurate estimation of effective population size in the Korean dairy cattle based on linkage disequilibrium corrected by genomic relationship matrix 34 2.1 Abstract 35 2.2 Introduction 36 2.3 Materials and Methods 38 2.4 Results 48 2.5 Discussion 60 Chapter 3. Estimation of historical effective population size in the Minke whale based on coalescent model 65 3.1 Abstract 66 3.2 Introduction 67 3.3 Materials and Methods 68 3.4 Results 75 3.5 Discussion 80 Chapter 4. Multiple genes related to muscle identified through a joint analysis of a two-stage genome-wide association study for racing performance of 1,156 Thoroughbreds 82 4.1 Abstract 83 4.2 Introduction 84 4.3 Materials and Methods 86 4.4 Results 91 4.5 Discussion 116 Chapter 5. Multivariate GWAS of milk production traits using genomic estimated breeding value 122 5.1 Abstract 123 5.2 Introduction 124 5.3 Materials & Methods 127 5.4 Results 133 5.5 Discussion 149 Chapter 6. Deleted copy number variation of Hanwoo and Holstein using next generation sequencing at the population level 153 6.1 Abstract 154 6.2 Introduction 155 6.3 Materials & Methods 158 6.4 Results 169 6.5 Discussion 207 General Discussion 223 Reference 225 국문초록 250 감사의 글 253 Doctor