Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors

A F0 and voicing status estimation algorithm for high quality speech analysis/synthesis is proposed. This problem is approached from a different perspective that models the behavior of feature extractors under noise, instead of directly modeling speech signals. Under time-frequency locality assumpti...

Full description

Bibliographic Details
Main Author:	Hua, Kanru
Format:	Report
Language:	unknown
Published:	arXiv 2017
Subjects:	Audio and Speech Processing eess.AS Sound cs.SD FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences Arctic
Online Access:	https://dx.doi.org/10.48550/arxiv.1710.11317 https://arxiv.org/abs/1710.11317

id	ftdatacite:10.48550/arxiv.1710.11317
record_format	openpolar
spelling	ftdatacite:10.48550/arxiv.1710.11317 2023-05-15T15:04:18+02:00 Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors Hua, Kanru 2017 https://dx.doi.org/10.48550/arxiv.1710.11317 https://arxiv.org/abs/1710.11317 unknown arXiv Creative Commons Attribution Non Commercial Share Alike 4.0 International https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode cc-by-nc-sa-4.0 CC-BY-NC-SA Audio and Speech Processing eess.AS Sound cs.SD FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences Preprint Article article CreativeWork 2017 ftdatacite https://doi.org/10.48550/arxiv.1710.11317 2022-04-01T10:28:58Z A F0 and voicing status estimation algorithm for high quality speech analysis/synthesis is proposed. This problem is approached from a different perspective that models the behavior of feature extractors under noise, instead of directly modeling speech signals. Under time-frequency locality assumptions, the joint distribution of extracted features and target F0 can be characterized by training a bank of Gaussian mixture models (GMM) on artificial data generated from Monte-Carlo simulations. The trained GMMs can then be used to generate a set of conditional distributions on the predicted F0, which are then combined and post-processed by Viterbi algorithm to give a final F0 trajectory. Evaluation on CSTR and CMU Arctic speech databases shows that the proposed method, trained on fully synthetic data, achieves lower gross error rates than state-of-the-art methods. : To be presented at Interspeech 2018 Report Arctic DataCite Metadata Store (German National Library of Science and Technology) Arctic
institution	Open Polar
collection	DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id	ftdatacite
language	unknown
topic	Audio and Speech Processing eess.AS Sound cs.SD FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences
spellingShingle	Audio and Speech Processing eess.AS Sound cs.SD FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences Hua, Kanru Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
topic_facet	Audio and Speech Processing eess.AS Sound cs.SD FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences
description	A F0 and voicing status estimation algorithm for high quality speech analysis/synthesis is proposed. This problem is approached from a different perspective that models the behavior of feature extractors under noise, instead of directly modeling speech signals. Under time-frequency locality assumptions, the joint distribution of extracted features and target F0 can be characterized by training a bank of Gaussian mixture models (GMM) on artificial data generated from Monte-Carlo simulations. The trained GMMs can then be used to generate a set of conditional distributions on the predicted F0, which are then combined and post-processed by Viterbi algorithm to give a final F0 trajectory. Evaluation on CSTR and CMU Arctic speech databases shows that the proposed method, trained on fully synthetic data, achieves lower gross error rates than state-of-the-art methods. : To be presented at Interspeech 2018
format	Report
author	Hua, Kanru
author_facet	Hua, Kanru
author_sort	Hua, Kanru
title	Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
title_short	Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
title_full	Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
title_fullStr	Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
title_full_unstemmed	Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
title_sort	nebula: f0 estimation and voicing detection by modeling the statistical properties of feature extractors
publisher	arXiv
publishDate	2017
url	https://dx.doi.org/10.48550/arxiv.1710.11317 https://arxiv.org/abs/1710.11317
geographic	Arctic
geographic_facet	Arctic
genre	Arctic
genre_facet	Arctic
op_rights	Creative Commons Attribution Non Commercial Share Alike 4.0 International https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode cc-by-nc-sa-4.0
op_rightsnorm	CC-BY-NC-SA
op_doi	https://doi.org/10.48550/arxiv.1710.11317
_version_	1766336093891854336

Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors

Similar Items