Calling large indels in 1047 Arabidopsis with IndelEnsembler
Abstract Large indels greatly impact the observable phenotypes in different organisms including plants and human. Hence, extracting large indels with high precision and sensitivity is important. Here, we developed IndelEnsembler to detect large indels in 1047 Arabidopsis whole-genome sequencing data...
Published in: | Nucleic Acids Research |
---|---|
Main Authors: | , , , , , , |
Other Authors: | , , |
Format: | Article in Journal/Newspaper |
Language: | English |
Published: |
Oxford University Press (OUP)
2021
|
Subjects: | |
Online Access: | http://dx.doi.org/10.1093/nar/gkab904 http://academic.oup.com/nar/article-pdf/49/19/10879/41070880/gkab904.pdf |
id |
croxfordunivpr:10.1093/nar/gkab904 |
---|---|
record_format |
openpolar |
spelling |
croxfordunivpr:10.1093/nar/gkab904 2024-09-15T18:25:05+00:00 Calling large indels in 1047 Arabidopsis with IndelEnsembler Liu, Dong-Xu Rajaby, Ramesh Wei, Lu-Lu Zhang, Lei Yang, Zhi-Quan Yang, Qing-Yong Sung, Wing-Kin National Key Research and Development Plan of China National Natural Science Foundation of China Fundamental Research Funds for the Central Universities 2021 http://dx.doi.org/10.1093/nar/gkab904 http://academic.oup.com/nar/article-pdf/49/19/10879/41070880/gkab904.pdf en eng Oxford University Press (OUP) https://creativecommons.org/licenses/by-nc/4.0/ Nucleic Acids Research volume 49, issue 19, page 10879-10894 ISSN 0305-1048 1362-4962 journal-article 2021 croxfordunivpr https://doi.org/10.1093/nar/gkab904 2024-08-05T04:31:00Z Abstract Large indels greatly impact the observable phenotypes in different organisms including plants and human. Hence, extracting large indels with high precision and sensitivity is important. Here, we developed IndelEnsembler to detect large indels in 1047 Arabidopsis whole-genome sequencing data. IndelEnsembler identified 34 093 deletions, 12 913 tandem duplications and 9773 insertions. Our large indel dataset was more comprehensive and accurate compared with the previous dataset of AthCNV (1). We captured nearly twice of the ground truth deletions and on average 27% more ground truth duplications compared with AthCNV, though our dataset has less number of large indels compared with AthCNV. Our large indels were positively correlated with transposon elements across the Arabidopsis genome. The non-homologous recombination events were the major formation mechanism of deletions in Arabidopsis genome. The Neighbor joining (NJ) tree constructed based on IndelEnsembler's deletions clearly divided the geographic subgroups of 1047 Arabidopsis. More importantly, our large indels represent a previously unassessed source of genetic variation. Approximately 49% of the deletions have low linkage disequilibrium (LD) with surrounding single nucleotide polymorphisms. Some of them could affect trait performance. For instance, using deletion-based genome-wide association study (DEL-GWAS), the accessions containing a 182-bp deletion in AT1G11520 had delayed flowering time and all accessions in north Sweden had the 182-bp deletion. We also found the accessions with 65-bp deletion in the first exon of AT4G00650 (FRI) flowered earlier than those without it. These two deletions cannot be detected in AthCNV and, interestingly, they do not co-occur in any Arabidopsis thaliana accession. By SNP-GWAS, surrounding SNPs of these two deletions do not correlate with flowering time. This example demonstrated that existing large indel datasets miss phenotypic variations and our large indel dataset filled in the gap. Article in Journal/Newspaper North Sweden Oxford University Press Nucleic Acids Research 49 19 10879 10894 |
institution |
Open Polar |
collection |
Oxford University Press |
op_collection_id |
croxfordunivpr |
language |
English |
description |
Abstract Large indels greatly impact the observable phenotypes in different organisms including plants and human. Hence, extracting large indels with high precision and sensitivity is important. Here, we developed IndelEnsembler to detect large indels in 1047 Arabidopsis whole-genome sequencing data. IndelEnsembler identified 34 093 deletions, 12 913 tandem duplications and 9773 insertions. Our large indel dataset was more comprehensive and accurate compared with the previous dataset of AthCNV (1). We captured nearly twice of the ground truth deletions and on average 27% more ground truth duplications compared with AthCNV, though our dataset has less number of large indels compared with AthCNV. Our large indels were positively correlated with transposon elements across the Arabidopsis genome. The non-homologous recombination events were the major formation mechanism of deletions in Arabidopsis genome. The Neighbor joining (NJ) tree constructed based on IndelEnsembler's deletions clearly divided the geographic subgroups of 1047 Arabidopsis. More importantly, our large indels represent a previously unassessed source of genetic variation. Approximately 49% of the deletions have low linkage disequilibrium (LD) with surrounding single nucleotide polymorphisms. Some of them could affect trait performance. For instance, using deletion-based genome-wide association study (DEL-GWAS), the accessions containing a 182-bp deletion in AT1G11520 had delayed flowering time and all accessions in north Sweden had the 182-bp deletion. We also found the accessions with 65-bp deletion in the first exon of AT4G00650 (FRI) flowered earlier than those without it. These two deletions cannot be detected in AthCNV and, interestingly, they do not co-occur in any Arabidopsis thaliana accession. By SNP-GWAS, surrounding SNPs of these two deletions do not correlate with flowering time. This example demonstrated that existing large indel datasets miss phenotypic variations and our large indel dataset filled in the gap. |
author2 |
National Key Research and Development Plan of China National Natural Science Foundation of China Fundamental Research Funds for the Central Universities |
format |
Article in Journal/Newspaper |
author |
Liu, Dong-Xu Rajaby, Ramesh Wei, Lu-Lu Zhang, Lei Yang, Zhi-Quan Yang, Qing-Yong Sung, Wing-Kin |
spellingShingle |
Liu, Dong-Xu Rajaby, Ramesh Wei, Lu-Lu Zhang, Lei Yang, Zhi-Quan Yang, Qing-Yong Sung, Wing-Kin Calling large indels in 1047 Arabidopsis with IndelEnsembler |
author_facet |
Liu, Dong-Xu Rajaby, Ramesh Wei, Lu-Lu Zhang, Lei Yang, Zhi-Quan Yang, Qing-Yong Sung, Wing-Kin |
author_sort |
Liu, Dong-Xu |
title |
Calling large indels in 1047 Arabidopsis with IndelEnsembler |
title_short |
Calling large indels in 1047 Arabidopsis with IndelEnsembler |
title_full |
Calling large indels in 1047 Arabidopsis with IndelEnsembler |
title_fullStr |
Calling large indels in 1047 Arabidopsis with IndelEnsembler |
title_full_unstemmed |
Calling large indels in 1047 Arabidopsis with IndelEnsembler |
title_sort |
calling large indels in 1047 arabidopsis with indelensembler |
publisher |
Oxford University Press (OUP) |
publishDate |
2021 |
url |
http://dx.doi.org/10.1093/nar/gkab904 http://academic.oup.com/nar/article-pdf/49/19/10879/41070880/gkab904.pdf |
genre |
North Sweden |
genre_facet |
North Sweden |
op_source |
Nucleic Acids Research volume 49, issue 19, page 10879-10894 ISSN 0305-1048 1362-4962 |
op_rights |
https://creativecommons.org/licenses/by-nc/4.0/ |
op_doi |
https://doi.org/10.1093/nar/gkab904 |
container_title |
Nucleic Acids Research |
container_volume |
49 |
container_issue |
19 |
container_start_page |
10879 |
op_container_end_page |
10894 |
_version_ |
1810465492420263936 |