A Replication Dataset for Fundamental Frequency Estimation

Part of the dissertation Pitch of Voiced Speech in the Short-Time Fourier Transform: Algorithms, Ground Truths, and Evaluation Methods. © 2020, Bastian Bechtold. All rights reserved. Estimating the fundamental frequency of speech remains an active area of research, with varied applications in speech...

Full description

Bibliographic Details
Main Author: Bechtold, Bastian
Format: Dataset
Language:English
Published: 2020
Subjects:
Online Access:https://zenodo.org/record/3904389
https://doi.org/10.5281/zenodo.3904389
Description
Summary:Part of the dissertation Pitch of Voiced Speech in the Short-Time Fourier Transform: Algorithms, Ground Truths, and Evaluation Methods. © 2020, Bastian Bechtold. All rights reserved. Estimating the fundamental frequency of speech remains an active area of research, with varied applications in speech recognition, speaker identification, and speech compression. A vast number of algorithms for estimatimating this quantity have been proposed over the years, and a number of speech and noise corpora have been developed for evaluating their performance. The present dataset contains estimated fundamental frequency tracks of 25 algorithms, six speech corpora, two noise corpora, at nine signal-to-noise ratios between -20 and 20 dB SNR, as well as an additional evaluation of synthetic harmonic tone complexes in white noise. The dataset also contains pre-calculated performance measures both novel and traditional, in reference to each speech corpus’ ground truth, the algorithms’ own clean-speech estimate, and our own consensus truth. It can thus serve as the basis for a comparison study, or to replicate existing studies from a larger dataset, or as a reference for developing new fundamental frequency estimation algorithms. All source code and data is available to download, and entirely reproducible, albeit requiring about one year of processor-time. Included Code and Data ground truth data.zip is a JBOF dataset of fundamental frequency estimates and ground truths of all speech files in the following corpora: CMU-ARCTIC (consensus truth) [1] FDA (corpus truth and consensus truth) [2] KEELE (corpus truth and consensus truth) [3] MOCHA-TIMIT (consensus truth) [4] PTDB-TUG (corpus truth and consensus truth) [5] TIMIT (consensus truth) [6] noisy speech data.zip is a JBOF datasets of fundamental frequency estimates of speech files mixed with noise from the following corpora: NOISEX [7] QUT-NOISE [8] synthetic speech data.zip is a JBOF dataset of fundamental frequency estimates of synthetic harmonic tone complexes in white noise. ...