In standard template-based Automatic Speech Recognition

Recently, the use of phoneme class-conditional probabilities as features (posterior features) for template-based ASR has been proposed. These features have been found to generalize well to unseen data and yield better systems than standard spectral-based features. In this paper, motivated by the hig...

Full description

Bibliographic Details
Main Authors: Serena Soldo, Mathew Magimai. -doss, Herve ́ Bourlard
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.651.1093
http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf
id ftciteseerx:oai:CiteSeerX.psu:10.1.1.651.1093
record_format openpolar
spelling ftciteseerx:oai:CiteSeerX.psu:10.1.1.651.1093 2023-05-15T15:04:41+02:00 In standard template-based Automatic Speech Recognition Serena Soldo Mathew Magimai. -doss Herve ́ Bourlard The Pennsylvania State University CiteSeerX Archives application/pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.651.1093 http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.651.1093 http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf Metadata may be used without restrictions as long as the oai identifier remains attached to it. http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf Index Terms Speech recognition template-based approach text ftciteseerx 2016-01-08T16:23:48Z Recently, the use of phoneme class-conditional probabilities as features (posterior features) for template-based ASR has been proposed. These features have been found to generalize well to unseen data and yield better systems than standard spectral-based features. In this paper, motivated by the high quality of current text-to-speech systems and the robustness of posterior features toward undesired variability, we investigate the use of synthetic speech to generate reference templates. The use of synthetic speech in template-based ASR not only allows to ad-dress the issue of in-domain data collection but also expansion of vocabulary. Using 75- and 600-word task-independent and speaker-independent setup on Phonebook database, we investi-gate different synthetic voices produced by the Festival HTS-based synthesizer trained on CMU ARCTIC databases. Our study shows that synthetic speech templates can yield perfor-mance comparable to the natural speech templates, especially with synthetic voices that have high intelligibility. Text Arctic Unknown Arctic
institution Open Polar
collection Unknown
op_collection_id ftciteseerx
language English
topic Index Terms
Speech recognition
template-based approach
spellingShingle Index Terms
Speech recognition
template-based approach
Serena Soldo
Mathew Magimai. -doss
Herve ́ Bourlard
In standard template-based Automatic Speech Recognition
topic_facet Index Terms
Speech recognition
template-based approach
description Recently, the use of phoneme class-conditional probabilities as features (posterior features) for template-based ASR has been proposed. These features have been found to generalize well to unseen data and yield better systems than standard spectral-based features. In this paper, motivated by the high quality of current text-to-speech systems and the robustness of posterior features toward undesired variability, we investigate the use of synthetic speech to generate reference templates. The use of synthetic speech in template-based ASR not only allows to ad-dress the issue of in-domain data collection but also expansion of vocabulary. Using 75- and 600-word task-independent and speaker-independent setup on Phonebook database, we investi-gate different synthetic voices produced by the Festival HTS-based synthesizer trained on CMU ARCTIC databases. Our study shows that synthetic speech templates can yield perfor-mance comparable to the natural speech templates, especially with synthetic voices that have high intelligibility.
author2 The Pennsylvania State University CiteSeerX Archives
format Text
author Serena Soldo
Mathew Magimai. -doss
Herve ́ Bourlard
author_facet Serena Soldo
Mathew Magimai. -doss
Herve ́ Bourlard
author_sort Serena Soldo
title In standard template-based Automatic Speech Recognition
title_short In standard template-based Automatic Speech Recognition
title_full In standard template-based Automatic Speech Recognition
title_fullStr In standard template-based Automatic Speech Recognition
title_full_unstemmed In standard template-based Automatic Speech Recognition
title_sort in standard template-based automatic speech recognition
url http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.651.1093
http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf
geographic Arctic
geographic_facet Arctic
genre Arctic
genre_facet Arctic
op_source http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf
op_relation http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.651.1093
http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf
op_rights Metadata may be used without restrictions as long as the oai identifier remains attached to it.
_version_ 1766336418548809728