Calling large indels in 1047 Arabidopsis with IndelEnsembler
Large indels greatly impact the observable phenotypes in different organisms including plants and human. Hence, extracting large indels with high precision and sensitivity is important. Here, we developed IndelEnsembler to detect large indels in 1047 Arabidopsis whole-genome sequencing data. IndelEn...
Published in: | Nucleic Acids Research |
---|---|
Main Authors: | , , , , , , |
Format: | Text |
Language: | English |
Published: |
Oxford University Press
2021
|
Subjects: | |
Online Access: | http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8565333/ http://www.ncbi.nlm.nih.gov/pubmed/34643730 https://doi.org/10.1093/nar/gkab904 |
id |
ftpubmed:oai:pubmedcentral.nih.gov:8565333 |
---|---|
record_format |
openpolar |
spelling |
ftpubmed:oai:pubmedcentral.nih.gov:8565333 2023-05-15T17:40:19+02:00 Calling large indels in 1047 Arabidopsis with IndelEnsembler Liu, Dong-Xu Rajaby, Ramesh Wei, Lu-Lu Zhang, Lei Yang, Zhi-Quan Yang, Qing-Yong Sung, Wing-Kin 2021-10-13 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8565333/ http://www.ncbi.nlm.nih.gov/pubmed/34643730 https://doi.org/10.1093/nar/gkab904 en eng Oxford University Press http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8565333/ http://www.ncbi.nlm.nih.gov/pubmed/34643730 http://dx.doi.org/10.1093/nar/gkab904 © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com CC-BY-NC Nucleic Acids Res Computational Biology Text 2021 ftpubmed https://doi.org/10.1093/nar/gkab904 2021-11-07T01:59:24Z Large indels greatly impact the observable phenotypes in different organisms including plants and human. Hence, extracting large indels with high precision and sensitivity is important. Here, we developed IndelEnsembler to detect large indels in 1047 Arabidopsis whole-genome sequencing data. IndelEnsembler identified 34 093 deletions, 12 913 tandem duplications and 9773 insertions. Our large indel dataset was more comprehensive and accurate compared with the previous dataset of AthCNV (1). We captured nearly twice of the ground truth deletions and on average 27% more ground truth duplications compared with AthCNV, though our dataset has less number of large indels compared with AthCNV. Our large indels were positively correlated with transposon elements across the Arabidopsis genome. The non-homologous recombination events were the major formation mechanism of deletions in Arabidopsis genome. The Neighbor joining (NJ) tree constructed based on IndelEnsembler's deletions clearly divided the geographic subgroups of 1047 Arabidopsis. More importantly, our large indels represent a previously unassessed source of genetic variation. Approximately 49% of the deletions have low linkage disequilibrium (LD) with surrounding single nucleotide polymorphisms. Some of them could affect trait performance. For instance, using deletion-based genome-wide association study (DEL-GWAS), the accessions containing a 182-bp deletion in AT1G11520 had delayed flowering time and all accessions in north Sweden had the 182-bp deletion. We also found the accessions with 65-bp deletion in the first exon of AT4G00650 (FRI) flowered earlier than those without it. These two deletions cannot be detected in AthCNV and, interestingly, they do not co-occur in any Arabidopsis thaliana accession. By SNP-GWAS, surrounding SNPs of these two deletions do not correlate with flowering time. This example demonstrated that existing large indel datasets miss phenotypic variations and our large indel dataset filled in the gap. Text North Sweden PubMed Central (PMC) Indel’ ENVELOPE(35.282,35.282,66.963,66.963) Nucleic Acids Research 49 19 10879 10894 |
institution |
Open Polar |
collection |
PubMed Central (PMC) |
op_collection_id |
ftpubmed |
language |
English |
topic |
Computational Biology |
spellingShingle |
Computational Biology Liu, Dong-Xu Rajaby, Ramesh Wei, Lu-Lu Zhang, Lei Yang, Zhi-Quan Yang, Qing-Yong Sung, Wing-Kin Calling large indels in 1047 Arabidopsis with IndelEnsembler |
topic_facet |
Computational Biology |
description |
Large indels greatly impact the observable phenotypes in different organisms including plants and human. Hence, extracting large indels with high precision and sensitivity is important. Here, we developed IndelEnsembler to detect large indels in 1047 Arabidopsis whole-genome sequencing data. IndelEnsembler identified 34 093 deletions, 12 913 tandem duplications and 9773 insertions. Our large indel dataset was more comprehensive and accurate compared with the previous dataset of AthCNV (1). We captured nearly twice of the ground truth deletions and on average 27% more ground truth duplications compared with AthCNV, though our dataset has less number of large indels compared with AthCNV. Our large indels were positively correlated with transposon elements across the Arabidopsis genome. The non-homologous recombination events were the major formation mechanism of deletions in Arabidopsis genome. The Neighbor joining (NJ) tree constructed based on IndelEnsembler's deletions clearly divided the geographic subgroups of 1047 Arabidopsis. More importantly, our large indels represent a previously unassessed source of genetic variation. Approximately 49% of the deletions have low linkage disequilibrium (LD) with surrounding single nucleotide polymorphisms. Some of them could affect trait performance. For instance, using deletion-based genome-wide association study (DEL-GWAS), the accessions containing a 182-bp deletion in AT1G11520 had delayed flowering time and all accessions in north Sweden had the 182-bp deletion. We also found the accessions with 65-bp deletion in the first exon of AT4G00650 (FRI) flowered earlier than those without it. These two deletions cannot be detected in AthCNV and, interestingly, they do not co-occur in any Arabidopsis thaliana accession. By SNP-GWAS, surrounding SNPs of these two deletions do not correlate with flowering time. This example demonstrated that existing large indel datasets miss phenotypic variations and our large indel dataset filled in the gap. |
format |
Text |
author |
Liu, Dong-Xu Rajaby, Ramesh Wei, Lu-Lu Zhang, Lei Yang, Zhi-Quan Yang, Qing-Yong Sung, Wing-Kin |
author_facet |
Liu, Dong-Xu Rajaby, Ramesh Wei, Lu-Lu Zhang, Lei Yang, Zhi-Quan Yang, Qing-Yong Sung, Wing-Kin |
author_sort |
Liu, Dong-Xu |
title |
Calling large indels in 1047 Arabidopsis with IndelEnsembler |
title_short |
Calling large indels in 1047 Arabidopsis with IndelEnsembler |
title_full |
Calling large indels in 1047 Arabidopsis with IndelEnsembler |
title_fullStr |
Calling large indels in 1047 Arabidopsis with IndelEnsembler |
title_full_unstemmed |
Calling large indels in 1047 Arabidopsis with IndelEnsembler |
title_sort |
calling large indels in 1047 arabidopsis with indelensembler |
publisher |
Oxford University Press |
publishDate |
2021 |
url |
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8565333/ http://www.ncbi.nlm.nih.gov/pubmed/34643730 https://doi.org/10.1093/nar/gkab904 |
long_lat |
ENVELOPE(35.282,35.282,66.963,66.963) |
geographic |
Indel’ |
geographic_facet |
Indel’ |
genre |
North Sweden |
genre_facet |
North Sweden |
op_source |
Nucleic Acids Res |
op_relation |
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8565333/ http://www.ncbi.nlm.nih.gov/pubmed/34643730 http://dx.doi.org/10.1093/nar/gkab904 |
op_rights |
© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
op_rightsnorm |
CC-BY-NC |
op_doi |
https://doi.org/10.1093/nar/gkab904 |
container_title |
Nucleic Acids Research |
container_volume |
49 |
container_issue |
19 |
container_start_page |
10879 |
op_container_end_page |
10894 |
_version_ |
1766141201236361216 |