Data from: PROTAX-fungi: a web-based tool for probabilistic taxonomic placement of fungal ITS sequences

• Incompleteness of reference sequence databases and unresolved taxonomic relationships complicates taxonomic placement of fungal sequences. We developed PROTAX-fungi, a general tool for taxonomic placement of fungal ITS sequences, and implemented it into the PlutoF platform of the UNITE database fo...

Full description

Bibliographic Details
Main Authors: Abarenkov, Kessy, Somervuo, Panu, Nilsson, R. Henrik, Kirk, Paul M., Huotari, Tea, Abrego, Nerea, Ovaskainen, Otso
Format: Article in Journal/Newspaper
Language:unknown
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10255/dryad.181444
https://doi.org/10.5061/dryad.9dr6j0c
Description
Summary:• Incompleteness of reference sequence databases and unresolved taxonomic relationships complicates taxonomic placement of fungal sequences. We developed PROTAX-fungi, a general tool for taxonomic placement of fungal ITS sequences, and implemented it into the PlutoF platform of the UNITE database for molecular identification of fungi. • PROTAX-fungi outperformed the SINTAX and RDB classifiers in terms of increased accuracy and decreased calibration error when applied to data on mock communities representing species groups with poor sequence database coverage. • With empirical data on root- and wood-associated fungi, PROTAX-fungi identified reliably (with at least 90% identification probability) the majority of sequences to the order level but only ca. one fifth of them to the species level, reflecting the current limited coverage of the databases. • When applied to examine the internal consistencies of the Index Fungorum and UNITE databases, PROTAX-fungi revealed inconsistencies in the taxonomy database as well as mislabelling and sequence quality problems in the reference database. The according improvements were implemented in both databases. • PROTAX-fungi provides a robust tool for performing statistically reliable identifications of fungi in spite of the incompleteness of extant reference sequence databases and unresolved taxonomic relationships.