Calling large indels in 1047 Arabidopsis with IndelEnsembler

Abstract Large indels greatly impact the observable phenotypes in different organisms including plants and human. Hence, extracting large indels with high precision and sensitivity is important. Here, we developed IndelEnsembler to detect large indels in 1047 Arabidopsis whole-genome sequencing data...

Full description

Bibliographic Details
Published in:Nucleic Acids Research
Main Authors: Liu, Dong-Xu, Rajaby, Ramesh, Wei, Lu-Lu, Zhang, Lei, Yang, Zhi-Quan, Yang, Qing-Yong, Sung, Wing-Kin
Other Authors: National Key Research and Development Plan of China, National Natural Science Foundation of China, Fundamental Research Funds for the Central Universities
Format: Article in Journal/Newspaper
Language:English
Published: Oxford University Press (OUP) 2021
Subjects:
Online Access:http://dx.doi.org/10.1093/nar/gkab904
http://academic.oup.com/nar/article-pdf/49/19/10879/41070880/gkab904.pdf
id croxfordunivpr:10.1093/nar/gkab904
record_format openpolar
spelling croxfordunivpr:10.1093/nar/gkab904 2024-09-15T18:25:05+00:00 Calling large indels in 1047 Arabidopsis with IndelEnsembler Liu, Dong-Xu Rajaby, Ramesh Wei, Lu-Lu Zhang, Lei Yang, Zhi-Quan Yang, Qing-Yong Sung, Wing-Kin National Key Research and Development Plan of China National Natural Science Foundation of China Fundamental Research Funds for the Central Universities 2021 http://dx.doi.org/10.1093/nar/gkab904 http://academic.oup.com/nar/article-pdf/49/19/10879/41070880/gkab904.pdf en eng Oxford University Press (OUP) https://creativecommons.org/licenses/by-nc/4.0/ Nucleic Acids Research volume 49, issue 19, page 10879-10894 ISSN 0305-1048 1362-4962 journal-article 2021 croxfordunivpr https://doi.org/10.1093/nar/gkab904 2024-08-05T04:31:00Z Abstract Large indels greatly impact the observable phenotypes in different organisms including plants and human. Hence, extracting large indels with high precision and sensitivity is important. Here, we developed IndelEnsembler to detect large indels in 1047 Arabidopsis whole-genome sequencing data. IndelEnsembler identified 34 093 deletions, 12 913 tandem duplications and 9773 insertions. Our large indel dataset was more comprehensive and accurate compared with the previous dataset of AthCNV (1). We captured nearly twice of the ground truth deletions and on average 27% more ground truth duplications compared with AthCNV, though our dataset has less number of large indels compared with AthCNV. Our large indels were positively correlated with transposon elements across the Arabidopsis genome. The non-homologous recombination events were the major formation mechanism of deletions in Arabidopsis genome. The Neighbor joining (NJ) tree constructed based on IndelEnsembler's deletions clearly divided the geographic subgroups of 1047 Arabidopsis. More importantly, our large indels represent a previously unassessed source of genetic variation. Approximately 49% of the deletions have low linkage disequilibrium (LD) with surrounding single nucleotide polymorphisms. Some of them could affect trait performance. For instance, using deletion-based genome-wide association study (DEL-GWAS), the accessions containing a 182-bp deletion in AT1G11520 had delayed flowering time and all accessions in north Sweden had the 182-bp deletion. We also found the accessions with 65-bp deletion in the first exon of AT4G00650 (FRI) flowered earlier than those without it. These two deletions cannot be detected in AthCNV and, interestingly, they do not co-occur in any Arabidopsis thaliana accession. By SNP-GWAS, surrounding SNPs of these two deletions do not correlate with flowering time. This example demonstrated that existing large indel datasets miss phenotypic variations and our large indel dataset filled in the gap. Article in Journal/Newspaper North Sweden Oxford University Press Nucleic Acids Research 49 19 10879 10894
institution Open Polar
collection Oxford University Press
op_collection_id croxfordunivpr
language English
description Abstract Large indels greatly impact the observable phenotypes in different organisms including plants and human. Hence, extracting large indels with high precision and sensitivity is important. Here, we developed IndelEnsembler to detect large indels in 1047 Arabidopsis whole-genome sequencing data. IndelEnsembler identified 34 093 deletions, 12 913 tandem duplications and 9773 insertions. Our large indel dataset was more comprehensive and accurate compared with the previous dataset of AthCNV (1). We captured nearly twice of the ground truth deletions and on average 27% more ground truth duplications compared with AthCNV, though our dataset has less number of large indels compared with AthCNV. Our large indels were positively correlated with transposon elements across the Arabidopsis genome. The non-homologous recombination events were the major formation mechanism of deletions in Arabidopsis genome. The Neighbor joining (NJ) tree constructed based on IndelEnsembler's deletions clearly divided the geographic subgroups of 1047 Arabidopsis. More importantly, our large indels represent a previously unassessed source of genetic variation. Approximately 49% of the deletions have low linkage disequilibrium (LD) with surrounding single nucleotide polymorphisms. Some of them could affect trait performance. For instance, using deletion-based genome-wide association study (DEL-GWAS), the accessions containing a 182-bp deletion in AT1G11520 had delayed flowering time and all accessions in north Sweden had the 182-bp deletion. We also found the accessions with 65-bp deletion in the first exon of AT4G00650 (FRI) flowered earlier than those without it. These two deletions cannot be detected in AthCNV and, interestingly, they do not co-occur in any Arabidopsis thaliana accession. By SNP-GWAS, surrounding SNPs of these two deletions do not correlate with flowering time. This example demonstrated that existing large indel datasets miss phenotypic variations and our large indel dataset filled in the gap.
author2 National Key Research and Development Plan of China
National Natural Science Foundation of China
Fundamental Research Funds for the Central Universities
format Article in Journal/Newspaper
author Liu, Dong-Xu
Rajaby, Ramesh
Wei, Lu-Lu
Zhang, Lei
Yang, Zhi-Quan
Yang, Qing-Yong
Sung, Wing-Kin
spellingShingle Liu, Dong-Xu
Rajaby, Ramesh
Wei, Lu-Lu
Zhang, Lei
Yang, Zhi-Quan
Yang, Qing-Yong
Sung, Wing-Kin
Calling large indels in 1047 Arabidopsis with IndelEnsembler
author_facet Liu, Dong-Xu
Rajaby, Ramesh
Wei, Lu-Lu
Zhang, Lei
Yang, Zhi-Quan
Yang, Qing-Yong
Sung, Wing-Kin
author_sort Liu, Dong-Xu
title Calling large indels in 1047 Arabidopsis with IndelEnsembler
title_short Calling large indels in 1047 Arabidopsis with IndelEnsembler
title_full Calling large indels in 1047 Arabidopsis with IndelEnsembler
title_fullStr Calling large indels in 1047 Arabidopsis with IndelEnsembler
title_full_unstemmed Calling large indels in 1047 Arabidopsis with IndelEnsembler
title_sort calling large indels in 1047 arabidopsis with indelensembler
publisher Oxford University Press (OUP)
publishDate 2021
url http://dx.doi.org/10.1093/nar/gkab904
http://academic.oup.com/nar/article-pdf/49/19/10879/41070880/gkab904.pdf
genre North Sweden
genre_facet North Sweden
op_source Nucleic Acids Research
volume 49, issue 19, page 10879-10894
ISSN 0305-1048 1362-4962
op_rights https://creativecommons.org/licenses/by-nc/4.0/
op_doi https://doi.org/10.1093/nar/gkab904
container_title Nucleic Acids Research
container_volume 49
container_issue 19
container_start_page 10879
op_container_end_page 10894
_version_ 1810465492420263936