Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors

A F0 and voicing status estimation algorithm for high quality speech analysis/synthesis is proposed. This problem is approached from a different perspective that models the behavior of feature extractors under noise, instead of directly modeling speech signals. Under time-frequency locality assumpti...

Full description

Bibliographic Details
Main Author: Hua, Kanru
Format: Report
Language:unknown
Published: arXiv 2017
Subjects:
Online Access:https://dx.doi.org/10.48550/arxiv.1710.11317
https://arxiv.org/abs/1710.11317
id ftdatacite:10.48550/arxiv.1710.11317
record_format openpolar
spelling ftdatacite:10.48550/arxiv.1710.11317 2023-05-15T15:04:18+02:00 Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors Hua, Kanru 2017 https://dx.doi.org/10.48550/arxiv.1710.11317 https://arxiv.org/abs/1710.11317 unknown arXiv Creative Commons Attribution Non Commercial Share Alike 4.0 International https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode cc-by-nc-sa-4.0 CC-BY-NC-SA Audio and Speech Processing eess.AS Sound cs.SD FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences Preprint Article article CreativeWork 2017 ftdatacite https://doi.org/10.48550/arxiv.1710.11317 2022-04-01T10:28:58Z A F0 and voicing status estimation algorithm for high quality speech analysis/synthesis is proposed. This problem is approached from a different perspective that models the behavior of feature extractors under noise, instead of directly modeling speech signals. Under time-frequency locality assumptions, the joint distribution of extracted features and target F0 can be characterized by training a bank of Gaussian mixture models (GMM) on artificial data generated from Monte-Carlo simulations. The trained GMMs can then be used to generate a set of conditional distributions on the predicted F0, which are then combined and post-processed by Viterbi algorithm to give a final F0 trajectory. Evaluation on CSTR and CMU Arctic speech databases shows that the proposed method, trained on fully synthetic data, achieves lower gross error rates than state-of-the-art methods. : To be presented at Interspeech 2018 Report Arctic DataCite Metadata Store (German National Library of Science and Technology) Arctic
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language unknown
topic Audio and Speech Processing eess.AS
Sound cs.SD
FOS Electrical engineering, electronic engineering, information engineering
FOS Computer and information sciences
spellingShingle Audio and Speech Processing eess.AS
Sound cs.SD
FOS Electrical engineering, electronic engineering, information engineering
FOS Computer and information sciences
Hua, Kanru
Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
topic_facet Audio and Speech Processing eess.AS
Sound cs.SD
FOS Electrical engineering, electronic engineering, information engineering
FOS Computer and information sciences
description A F0 and voicing status estimation algorithm for high quality speech analysis/synthesis is proposed. This problem is approached from a different perspective that models the behavior of feature extractors under noise, instead of directly modeling speech signals. Under time-frequency locality assumptions, the joint distribution of extracted features and target F0 can be characterized by training a bank of Gaussian mixture models (GMM) on artificial data generated from Monte-Carlo simulations. The trained GMMs can then be used to generate a set of conditional distributions on the predicted F0, which are then combined and post-processed by Viterbi algorithm to give a final F0 trajectory. Evaluation on CSTR and CMU Arctic speech databases shows that the proposed method, trained on fully synthetic data, achieves lower gross error rates than state-of-the-art methods. : To be presented at Interspeech 2018
format Report
author Hua, Kanru
author_facet Hua, Kanru
author_sort Hua, Kanru
title Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
title_short Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
title_full Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
title_fullStr Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
title_full_unstemmed Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
title_sort nebula: f0 estimation and voicing detection by modeling the statistical properties of feature extractors
publisher arXiv
publishDate 2017
url https://dx.doi.org/10.48550/arxiv.1710.11317
https://arxiv.org/abs/1710.11317
geographic Arctic
geographic_facet Arctic
genre Arctic
genre_facet Arctic
op_rights Creative Commons Attribution Non Commercial Share Alike 4.0 International
https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode
cc-by-nc-sa-4.0
op_rightsnorm CC-BY-NC-SA
op_doi https://doi.org/10.48550/arxiv.1710.11317
_version_ 1766336093891854336