Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
A F0 and voicing status estimation algorithm for high quality speech analysis/synthesis is proposed. This problem is approached from a different perspective that models the behavior of feature extractors under noise, instead of directly modeling speech signals. Under time-frequency locality assumpti...
Main Author: | |
---|---|
Format: | Text |
Language: | unknown |
Published: |
2017
|
Subjects: | |
Online Access: | http://arxiv.org/abs/1710.11317 |
id |
ftarxivpreprints:oai:arXiv.org:1710.11317 |
---|---|
record_format |
openpolar |
spelling |
ftarxivpreprints:oai:arXiv.org:1710.11317 2023-09-05T13:17:23+02:00 Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors Hua, Kanru 2017-10-31 http://arxiv.org/abs/1710.11317 unknown http://arxiv.org/abs/1710.11317 Electrical Engineering and Systems Science - Audio and Speech Processing Computer Science - Sound text 2017 ftarxivpreprints 2023-08-16T14:36:13Z A F0 and voicing status estimation algorithm for high quality speech analysis/synthesis is proposed. This problem is approached from a different perspective that models the behavior of feature extractors under noise, instead of directly modeling speech signals. Under time-frequency locality assumptions, the joint distribution of extracted features and target F0 can be characterized by training a bank of Gaussian mixture models (GMM) on artificial data generated from Monte-Carlo simulations. The trained GMMs can then be used to generate a set of conditional distributions on the predicted F0, which are then combined and post-processed by Viterbi algorithm to give a final F0 trajectory. Evaluation on CSTR and CMU Arctic speech databases shows that the proposed method, trained on fully synthetic data, achieves lower gross error rates than state-of-the-art methods. Comment: To be presented at Interspeech 2018 Text Arctic ArXiv.org (Cornell University Library) Arctic |
institution |
Open Polar |
collection |
ArXiv.org (Cornell University Library) |
op_collection_id |
ftarxivpreprints |
language |
unknown |
topic |
Electrical Engineering and Systems Science - Audio and Speech Processing Computer Science - Sound |
spellingShingle |
Electrical Engineering and Systems Science - Audio and Speech Processing Computer Science - Sound Hua, Kanru Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors |
topic_facet |
Electrical Engineering and Systems Science - Audio and Speech Processing Computer Science - Sound |
description |
A F0 and voicing status estimation algorithm for high quality speech analysis/synthesis is proposed. This problem is approached from a different perspective that models the behavior of feature extractors under noise, instead of directly modeling speech signals. Under time-frequency locality assumptions, the joint distribution of extracted features and target F0 can be characterized by training a bank of Gaussian mixture models (GMM) on artificial data generated from Monte-Carlo simulations. The trained GMMs can then be used to generate a set of conditional distributions on the predicted F0, which are then combined and post-processed by Viterbi algorithm to give a final F0 trajectory. Evaluation on CSTR and CMU Arctic speech databases shows that the proposed method, trained on fully synthetic data, achieves lower gross error rates than state-of-the-art methods. Comment: To be presented at Interspeech 2018 |
format |
Text |
author |
Hua, Kanru |
author_facet |
Hua, Kanru |
author_sort |
Hua, Kanru |
title |
Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors |
title_short |
Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors |
title_full |
Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors |
title_fullStr |
Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors |
title_full_unstemmed |
Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors |
title_sort |
nebula: f0 estimation and voicing detection by modeling the statistical properties of feature extractors |
publishDate |
2017 |
url |
http://arxiv.org/abs/1710.11317 |
geographic |
Arctic |
geographic_facet |
Arctic |
genre |
Arctic |
genre_facet |
Arctic |
op_relation |
http://arxiv.org/abs/1710.11317 |
_version_ |
1776198575939649536 |