Synthetic References for Template-based ASR using Posterior Features
Recently, the use of phoneme class-conditional probabilities as features (posterior features) for template-based ASR has been proposed. These features have been found to generalize well to unseen data and yield better systems than standard spectral-based features. In this paper, motivated by the hig...
Published in: | Interspeech 2012 |
---|---|
Main Authors: | , , |
Format: | Text |
Language: | unknown |
Published: |
2013
|
Subjects: | |
Online Access: | https://doi.org/10.21437/Interspeech.2012-573 https://infoscience.epfl.ch/record/192642/files/Soldo_INTERSPEECH_2012.pdf http://infoscience.epfl.ch/record/192642 |
id |
ftinfoscience:oai:infoscience.tind.io:192642 |
---|---|
record_format |
openpolar |
spelling |
ftinfoscience:oai:infoscience.tind.io:192642 2023-05-15T15:03:10+02:00 Synthetic References for Template-based ASR using Posterior Features Soldo, Serena Magimai.-Doss, Mathew Bourlard, Hervé 2013-12-19T17:29:49Z https://doi.org/10.21437/Interspeech.2012-573 https://infoscience.epfl.ch/record/192642/files/Soldo_INTERSPEECH_2012.pdf http://infoscience.epfl.ch/record/192642 unknown doi:10.21437/Interspeech.2012-573 https://infoscience.epfl.ch/record/192642/files/Soldo_INTERSPEECH_2012.pdf http://infoscience.epfl.ch/record/192642 http://infoscience.epfl.ch/record/192642 Text 2013 ftinfoscience https://doi.org/10.21437/Interspeech.2012-573 2023-02-13T22:17:51Z Recently, the use of phoneme class-conditional probabilities as features (posterior features) for template-based ASR has been proposed. These features have been found to generalize well to unseen data and yield better systems than standard spectral-based features. In this paper, motivated by the high quality of current text-to-speech systems and the robustness of posterior features toward undesired variability, we investigate the use of synthetic speech to generate reference templates. The use of synthetic speech in template-based ASR not only allows to address the issue of in-domain data collection but also expansion of vocabulary. Using 75- and 600-word task-independent and speaker-independent setup on Phonebook database, we investigate different synthetic voices produced by the Festival HTS-based synthesizer trained on CMU ARCTIC databases. Our study shows that synthetic speech templates can yield performance comparable to the natural speech templates, especially with synthetic voices that have high intelligibility. Text Arctic EPFL Infoscience (Ecole Polytechnique Fédérale Lausanne) Arctic Interspeech 2012 2146 2149 |
institution |
Open Polar |
collection |
EPFL Infoscience (Ecole Polytechnique Fédérale Lausanne) |
op_collection_id |
ftinfoscience |
language |
unknown |
description |
Recently, the use of phoneme class-conditional probabilities as features (posterior features) for template-based ASR has been proposed. These features have been found to generalize well to unseen data and yield better systems than standard spectral-based features. In this paper, motivated by the high quality of current text-to-speech systems and the robustness of posterior features toward undesired variability, we investigate the use of synthetic speech to generate reference templates. The use of synthetic speech in template-based ASR not only allows to address the issue of in-domain data collection but also expansion of vocabulary. Using 75- and 600-word task-independent and speaker-independent setup on Phonebook database, we investigate different synthetic voices produced by the Festival HTS-based synthesizer trained on CMU ARCTIC databases. Our study shows that synthetic speech templates can yield performance comparable to the natural speech templates, especially with synthetic voices that have high intelligibility. |
format |
Text |
author |
Soldo, Serena Magimai.-Doss, Mathew Bourlard, Hervé |
spellingShingle |
Soldo, Serena Magimai.-Doss, Mathew Bourlard, Hervé Synthetic References for Template-based ASR using Posterior Features |
author_facet |
Soldo, Serena Magimai.-Doss, Mathew Bourlard, Hervé |
author_sort |
Soldo, Serena |
title |
Synthetic References for Template-based ASR using Posterior Features |
title_short |
Synthetic References for Template-based ASR using Posterior Features |
title_full |
Synthetic References for Template-based ASR using Posterior Features |
title_fullStr |
Synthetic References for Template-based ASR using Posterior Features |
title_full_unstemmed |
Synthetic References for Template-based ASR using Posterior Features |
title_sort |
synthetic references for template-based asr using posterior features |
publishDate |
2013 |
url |
https://doi.org/10.21437/Interspeech.2012-573 https://infoscience.epfl.ch/record/192642/files/Soldo_INTERSPEECH_2012.pdf http://infoscience.epfl.ch/record/192642 |
geographic |
Arctic |
geographic_facet |
Arctic |
genre |
Arctic |
genre_facet |
Arctic |
op_source |
http://infoscience.epfl.ch/record/192642 |
op_relation |
doi:10.21437/Interspeech.2012-573 https://infoscience.epfl.ch/record/192642/files/Soldo_INTERSPEECH_2012.pdf http://infoscience.epfl.ch/record/192642 |
op_doi |
https://doi.org/10.21437/Interspeech.2012-573 |
container_title |
Interspeech 2012 |
container_start_page |
2146 |
op_container_end_page |
2149 |
_version_ |
1766335062461120512 |