A Replication Dataset for Fundamental Frequency Estimation

Part of the dissertation . © 2020, Bastian Bechtold. All rights reserved. Estimating the fundamental frequency of speech remains an active area of research, with varied applications in speech recognition, speaker identification, and speech compression. A vast number of algorithms for estimatimating...

Full description

Bibliographic Details
Main Author:	Bechtold, Bastian
Other Authors:	Bitzer, Joerg, van de Par, Steven
Format:	Other/Unknown Material
Language:	English
Published:	Zenodo 2020
Subjects:	signal processing audio speech pitch fundamental frequency Arctic
Online Access:	https://doi.org/10.5281/zenodo.3904389

Description
Summary:	Part of the dissertation . © 2020, Bastian Bechtold. All rights reserved. Estimating the fundamental frequency of speech remains an active area of research, with varied applications in speech recognition, speaker identification, and speech compression. A vast number of algorithms for estimatimating this quantity have been proposed over the years, and a number of speech and noise corpora have been developed for evaluating their performance. The present dataset contains estimated fundamental frequency tracks of 25 algorithms, six speech corpora, two noise corpora, at nine signal-to-noise ratios between -20 and 20 dB SNR, as well as an additional evaluation of synthetic harmonic tone complexes in white noise. The dataset also contains pre-calculated performance measures both novel and traditional, in reference to each speech corpus’ ground truth, the algorithms’ own clean-speech estimate, and our own consensus truth. It can thus serve as the basis for a comparison study, or to replicate existing studies from a larger dataset, or as a reference for developing new fundamental frequency estimation algorithms. All source code and data is available to download, and entirely reproducible, albeit requiring about one year of processor-time. Included Code and Data <code>ground truth data.zip</code> is a JBOF dataset of fundamental frequency estimates and ground truths of all speech files in the following corpora: CMU-ARCTIC ( consensus truth ) [1] FDA ( corpus truth and consensus truth ) [2] KEELE ( corpus truth and consensus truth ) [3] MOCHA-TIMIT ( consensus truth ) [4] PTDB-TUG ( corpus truth and consensus truth ) [5] TIMIT ( consensus truth ) [6] <code>noisy speech data.zip</code> is a JBOF datasets of fundamental frequency estimates of speech files mixed with noise from the following corpora: NOISEX [7] QUT-NOISE [8] <code>synthetic speech data.zip</code> is a JBOF dataset of fundamental frequency estimates of synthetic harmonic tone complexes in white noise. ...

A Replication Dataset for Fundamental Frequency Estimation

Similar Items