Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors

A F0 and voicing status estimation algorithm for high quality speech analysis/synthesis is proposed. This problem is approached from a different perspective that models the behavior of feature extractors under noise, instead of directly modeling speech signals. Under time-frequency locality assumpti...

Full description

Bibliographic Details
Main Author:	Hua, Kanru
Format:	Text
Language:	unknown
Published:	2017
Subjects:	Electrical Engineering and Systems Science - Audio and Speech Processing Computer Science - Sound Arctic
Online Access:	http://arxiv.org/abs/1710.11317

id	ftarxivpreprints:oai:arXiv.org:1710.11317
record_format	openpolar
spelling	ftarxivpreprints:oai:arXiv.org:1710.11317 2023-09-05T13:17:23+02:00 Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors Hua, Kanru 2017-10-31 http://arxiv.org/abs/1710.11317 unknown http://arxiv.org/abs/1710.11317 Electrical Engineering and Systems Science - Audio and Speech Processing Computer Science - Sound text 2017 ftarxivpreprints 2023-08-16T14:36:13Z A F0 and voicing status estimation algorithm for high quality speech analysis/synthesis is proposed. This problem is approached from a different perspective that models the behavior of feature extractors under noise, instead of directly modeling speech signals. Under time-frequency locality assumptions, the joint distribution of extracted features and target F0 can be characterized by training a bank of Gaussian mixture models (GMM) on artificial data generated from Monte-Carlo simulations. The trained GMMs can then be used to generate a set of conditional distributions on the predicted F0, which are then combined and post-processed by Viterbi algorithm to give a final F0 trajectory. Evaluation on CSTR and CMU Arctic speech databases shows that the proposed method, trained on fully synthetic data, achieves lower gross error rates than state-of-the-art methods. Comment: To be presented at Interspeech 2018 Text Arctic ArXiv.org (Cornell University Library) Arctic
institution	Open Polar
collection	ArXiv.org (Cornell University Library)
op_collection_id	ftarxivpreprints
language	unknown
topic	Electrical Engineering and Systems Science - Audio and Speech Processing Computer Science - Sound
spellingShingle	Electrical Engineering and Systems Science - Audio and Speech Processing Computer Science - Sound Hua, Kanru Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
topic_facet	Electrical Engineering and Systems Science - Audio and Speech Processing Computer Science - Sound
description	A F0 and voicing status estimation algorithm for high quality speech analysis/synthesis is proposed. This problem is approached from a different perspective that models the behavior of feature extractors under noise, instead of directly modeling speech signals. Under time-frequency locality assumptions, the joint distribution of extracted features and target F0 can be characterized by training a bank of Gaussian mixture models (GMM) on artificial data generated from Monte-Carlo simulations. The trained GMMs can then be used to generate a set of conditional distributions on the predicted F0, which are then combined and post-processed by Viterbi algorithm to give a final F0 trajectory. Evaluation on CSTR and CMU Arctic speech databases shows that the proposed method, trained on fully synthetic data, achieves lower gross error rates than state-of-the-art methods. Comment: To be presented at Interspeech 2018
format	Text
author	Hua, Kanru
author_facet	Hua, Kanru
author_sort	Hua, Kanru
title	Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
title_short	Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
title_full	Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
title_fullStr	Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
title_full_unstemmed	Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
title_sort	nebula: f0 estimation and voicing detection by modeling the statistical properties of feature extractors
publishDate	2017
url	http://arxiv.org/abs/1710.11317
geographic	Arctic
geographic_facet	Arctic
genre	Arctic
genre_facet	Arctic
op_relation	http://arxiv.org/abs/1710.11317
_version_	1776198575939649536

Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors

Similar Items