In standard template-based Automatic Speech Recognition
Recently, the use of phoneme class-conditional probabilities as features (posterior features) for template-based ASR has been proposed. These features have been found to generalize well to unseen data and yield better systems than standard spectral-based features. In this paper, motivated by the hig...
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Text |
Language: | English |
Subjects: | |
Online Access: | http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.651.1093 http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf |
id |
ftciteseerx:oai:CiteSeerX.psu:10.1.1.651.1093 |
---|---|
record_format |
openpolar |
spelling |
ftciteseerx:oai:CiteSeerX.psu:10.1.1.651.1093 2023-05-15T15:04:41+02:00 In standard template-based Automatic Speech Recognition Serena Soldo Mathew Magimai. -doss Herve ́ Bourlard The Pennsylvania State University CiteSeerX Archives application/pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.651.1093 http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.651.1093 http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf Metadata may be used without restrictions as long as the oai identifier remains attached to it. http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf Index Terms Speech recognition template-based approach text ftciteseerx 2016-01-08T16:23:48Z Recently, the use of phoneme class-conditional probabilities as features (posterior features) for template-based ASR has been proposed. These features have been found to generalize well to unseen data and yield better systems than standard spectral-based features. In this paper, motivated by the high quality of current text-to-speech systems and the robustness of posterior features toward undesired variability, we investigate the use of synthetic speech to generate reference templates. The use of synthetic speech in template-based ASR not only allows to ad-dress the issue of in-domain data collection but also expansion of vocabulary. Using 75- and 600-word task-independent and speaker-independent setup on Phonebook database, we investi-gate different synthetic voices produced by the Festival HTS-based synthesizer trained on CMU ARCTIC databases. Our study shows that synthetic speech templates can yield perfor-mance comparable to the natural speech templates, especially with synthetic voices that have high intelligibility. Text Arctic Unknown Arctic |
institution |
Open Polar |
collection |
Unknown |
op_collection_id |
ftciteseerx |
language |
English |
topic |
Index Terms Speech recognition template-based approach |
spellingShingle |
Index Terms Speech recognition template-based approach Serena Soldo Mathew Magimai. -doss Herve ́ Bourlard In standard template-based Automatic Speech Recognition |
topic_facet |
Index Terms Speech recognition template-based approach |
description |
Recently, the use of phoneme class-conditional probabilities as features (posterior features) for template-based ASR has been proposed. These features have been found to generalize well to unseen data and yield better systems than standard spectral-based features. In this paper, motivated by the high quality of current text-to-speech systems and the robustness of posterior features toward undesired variability, we investigate the use of synthetic speech to generate reference templates. The use of synthetic speech in template-based ASR not only allows to ad-dress the issue of in-domain data collection but also expansion of vocabulary. Using 75- and 600-word task-independent and speaker-independent setup on Phonebook database, we investi-gate different synthetic voices produced by the Festival HTS-based synthesizer trained on CMU ARCTIC databases. Our study shows that synthetic speech templates can yield perfor-mance comparable to the natural speech templates, especially with synthetic voices that have high intelligibility. |
author2 |
The Pennsylvania State University CiteSeerX Archives |
format |
Text |
author |
Serena Soldo Mathew Magimai. -doss Herve ́ Bourlard |
author_facet |
Serena Soldo Mathew Magimai. -doss Herve ́ Bourlard |
author_sort |
Serena Soldo |
title |
In standard template-based Automatic Speech Recognition |
title_short |
In standard template-based Automatic Speech Recognition |
title_full |
In standard template-based Automatic Speech Recognition |
title_fullStr |
In standard template-based Automatic Speech Recognition |
title_full_unstemmed |
In standard template-based Automatic Speech Recognition |
title_sort |
in standard template-based automatic speech recognition |
url |
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.651.1093 http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf |
geographic |
Arctic |
geographic_facet |
Arctic |
genre |
Arctic |
genre_facet |
Arctic |
op_source |
http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf |
op_relation |
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.651.1093 http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf |
op_rights |
Metadata may be used without restrictions as long as the oai identifier remains attached to it. |
_version_ |
1766336418548809728 |