Synthetic References for Template-based ASR using Posterior Features

Recently, the use of phoneme class-conditional probabilities as features (posterior features) for template-based ASR has been proposed. These features have been found to generalize well to unseen data and yield better systems than standard spectral-based features. In this paper, motivated by the hig...

Full description

Bibliographic Details
Published in:Interspeech 2012
Main Authors: Soldo, Serena, Magimai.-Doss, Mathew, Bourlard, Hervé
Format: Text
Language:unknown
Published: 2013
Subjects:
Online Access:https://doi.org/10.21437/Interspeech.2012-573
https://infoscience.epfl.ch/record/192642/files/Soldo_INTERSPEECH_2012.pdf
http://infoscience.epfl.ch/record/192642
id ftinfoscience:oai:infoscience.tind.io:192642
record_format openpolar
spelling ftinfoscience:oai:infoscience.tind.io:192642 2023-05-15T15:03:10+02:00 Synthetic References for Template-based ASR using Posterior Features Soldo, Serena Magimai.-Doss, Mathew Bourlard, Hervé 2013-12-19T17:29:49Z https://doi.org/10.21437/Interspeech.2012-573 https://infoscience.epfl.ch/record/192642/files/Soldo_INTERSPEECH_2012.pdf http://infoscience.epfl.ch/record/192642 unknown doi:10.21437/Interspeech.2012-573 https://infoscience.epfl.ch/record/192642/files/Soldo_INTERSPEECH_2012.pdf http://infoscience.epfl.ch/record/192642 http://infoscience.epfl.ch/record/192642 Text 2013 ftinfoscience https://doi.org/10.21437/Interspeech.2012-573 2023-02-13T22:17:51Z Recently, the use of phoneme class-conditional probabilities as features (posterior features) for template-based ASR has been proposed. These features have been found to generalize well to unseen data and yield better systems than standard spectral-based features. In this paper, motivated by the high quality of current text-to-speech systems and the robustness of posterior features toward undesired variability, we investigate the use of synthetic speech to generate reference templates. The use of synthetic speech in template-based ASR not only allows to address the issue of in-domain data collection but also expansion of vocabulary. Using 75- and 600-word task-independent and speaker-independent setup on Phonebook database, we investigate different synthetic voices produced by the Festival HTS-based synthesizer trained on CMU ARCTIC databases. Our study shows that synthetic speech templates can yield performance comparable to the natural speech templates, especially with synthetic voices that have high intelligibility. Text Arctic EPFL Infoscience (Ecole Polytechnique Fédérale Lausanne) Arctic Interspeech 2012 2146 2149
institution Open Polar
collection EPFL Infoscience (Ecole Polytechnique Fédérale Lausanne)
op_collection_id ftinfoscience
language unknown
description Recently, the use of phoneme class-conditional probabilities as features (posterior features) for template-based ASR has been proposed. These features have been found to generalize well to unseen data and yield better systems than standard spectral-based features. In this paper, motivated by the high quality of current text-to-speech systems and the robustness of posterior features toward undesired variability, we investigate the use of synthetic speech to generate reference templates. The use of synthetic speech in template-based ASR not only allows to address the issue of in-domain data collection but also expansion of vocabulary. Using 75- and 600-word task-independent and speaker-independent setup on Phonebook database, we investigate different synthetic voices produced by the Festival HTS-based synthesizer trained on CMU ARCTIC databases. Our study shows that synthetic speech templates can yield performance comparable to the natural speech templates, especially with synthetic voices that have high intelligibility.
format Text
author Soldo, Serena
Magimai.-Doss, Mathew
Bourlard, Hervé
spellingShingle Soldo, Serena
Magimai.-Doss, Mathew
Bourlard, Hervé
Synthetic References for Template-based ASR using Posterior Features
author_facet Soldo, Serena
Magimai.-Doss, Mathew
Bourlard, Hervé
author_sort Soldo, Serena
title Synthetic References for Template-based ASR using Posterior Features
title_short Synthetic References for Template-based ASR using Posterior Features
title_full Synthetic References for Template-based ASR using Posterior Features
title_fullStr Synthetic References for Template-based ASR using Posterior Features
title_full_unstemmed Synthetic References for Template-based ASR using Posterior Features
title_sort synthetic references for template-based asr using posterior features
publishDate 2013
url https://doi.org/10.21437/Interspeech.2012-573
https://infoscience.epfl.ch/record/192642/files/Soldo_INTERSPEECH_2012.pdf
http://infoscience.epfl.ch/record/192642
geographic Arctic
geographic_facet Arctic
genre Arctic
genre_facet Arctic
op_source http://infoscience.epfl.ch/record/192642
op_relation doi:10.21437/Interspeech.2012-573
https://infoscience.epfl.ch/record/192642/files/Soldo_INTERSPEECH_2012.pdf
http://infoscience.epfl.ch/record/192642
op_doi https://doi.org/10.21437/Interspeech.2012-573
container_title Interspeech 2012
container_start_page 2146
op_container_end_page 2149
_version_ 1766335062461120512