In standard template-based Automatic Speech Recognition

Recently, the use of phoneme class-conditional probabilities as features (posterior features) for template-based ASR has been proposed. These features have been found to generalize well to unseen data and yield better systems than standard spectral-based features. In this paper, motivated by the hig...

Full description

Bibliographic Details
Main Authors:	Serena Soldo, Mathew Magimai. -doss, Herve ́ Bourlard
Other Authors:	The Pennsylvania State University CiteSeerX Archives
Format:	Text
Language:	English
Subjects:	Index Terms Speech recognition template-based approach Arctic
Online Access:	http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.651.1093 http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf

id	ftciteseerx:oai:CiteSeerX.psu:10.1.1.651.1093
record_format	openpolar
spelling	ftciteseerx:oai:CiteSeerX.psu:10.1.1.651.1093 2023-05-15T15:04:41+02:00 In standard template-based Automatic Speech Recognition Serena Soldo Mathew Magimai. -doss Herve ́ Bourlard The Pennsylvania State University CiteSeerX Archives application/pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.651.1093 http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.651.1093 http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf Metadata may be used without restrictions as long as the oai identifier remains attached to it. http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf Index Terms Speech recognition template-based approach text ftciteseerx 2016-01-08T16:23:48Z Recently, the use of phoneme class-conditional probabilities as features (posterior features) for template-based ASR has been proposed. These features have been found to generalize well to unseen data and yield better systems than standard spectral-based features. In this paper, motivated by the high quality of current text-to-speech systems and the robustness of posterior features toward undesired variability, we investigate the use of synthetic speech to generate reference templates. The use of synthetic speech in template-based ASR not only allows to ad-dress the issue of in-domain data collection but also expansion of vocabulary. Using 75- and 600-word task-independent and speaker-independent setup on Phonebook database, we investi-gate different synthetic voices produced by the Festival HTS-based synthesizer trained on CMU ARCTIC databases. Our study shows that synthetic speech templates can yield perfor-mance comparable to the natural speech templates, especially with synthetic voices that have high intelligibility. Text Arctic Unknown Arctic
institution	Open Polar
collection	Unknown
op_collection_id	ftciteseerx
language	English
topic	Index Terms Speech recognition template-based approach
spellingShingle	Index Terms Speech recognition template-based approach Serena Soldo Mathew Magimai. -doss Herve ́ Bourlard In standard template-based Automatic Speech Recognition
topic_facet	Index Terms Speech recognition template-based approach
description	Recently, the use of phoneme class-conditional probabilities as features (posterior features) for template-based ASR has been proposed. These features have been found to generalize well to unseen data and yield better systems than standard spectral-based features. In this paper, motivated by the high quality of current text-to-speech systems and the robustness of posterior features toward undesired variability, we investigate the use of synthetic speech to generate reference templates. The use of synthetic speech in template-based ASR not only allows to ad-dress the issue of in-domain data collection but also expansion of vocabulary. Using 75- and 600-word task-independent and speaker-independent setup on Phonebook database, we investi-gate different synthetic voices produced by the Festival HTS-based synthesizer trained on CMU ARCTIC databases. Our study shows that synthetic speech templates can yield perfor-mance comparable to the natural speech templates, especially with synthetic voices that have high intelligibility.
author2	The Pennsylvania State University CiteSeerX Archives
format	Text
author	Serena Soldo Mathew Magimai. -doss Herve ́ Bourlard
author_facet	Serena Soldo Mathew Magimai. -doss Herve ́ Bourlard
author_sort	Serena Soldo
title	In standard template-based Automatic Speech Recognition
title_short	In standard template-based Automatic Speech Recognition
title_full	In standard template-based Automatic Speech Recognition
title_fullStr	In standard template-based Automatic Speech Recognition
title_full_unstemmed	In standard template-based Automatic Speech Recognition
title_sort	in standard template-based automatic speech recognition
url	http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.651.1093 http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf
geographic	Arctic
geographic_facet	Arctic
genre	Arctic
genre_facet	Arctic
op_source	http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf
op_relation	http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.651.1093 http://publications.idiap.ch/downloads/papers/2012/Soldo_INTERSPEECH_2012.pdf
op_rights	Metadata may be used without restrictions as long as the oai identifier remains attached to it.
_version_	1766336418548809728

In standard template-based Automatic Speech Recognition

Similar Items