Text to Speech in New Languages without a Standardized Orthography

Abstract Many spoken languages do not have a standardized writing system. Building text to speech voices for them, without accurate transcripts of speech data is difficult. Our language independent method to bootstrap synthetic voices using only speech data relies upon cross-lingual phonetic decodin...

Full description

Bibliographic Details
Main Authors: Sunayana Sitaram, Krishna Gopala, Justin Anumanchipalli, Alok Chiu, Alan W Parlikar, Black
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Published: 2013
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1047.4122
id ftciteseerx:oai:CiteSeerX.psu:10.1.1.1047.4122
record_format openpolar
spelling ftciteseerx:oai:CiteSeerX.psu:10.1.1.1047.4122 2023-05-15T16:55:38+02:00 Text to Speech in New Languages without a Standardized Orthography Sunayana Sitaram Krishna Gopala Justin Anumanchipalli Alok Chiu Alan W Parlikar Black The Pennsylvania State University CiteSeerX Archives 2013 application/pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1047.4122 en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1047.4122 Metadata may be used without restrictions as long as the oai identifier remains attached to it. https://www.parlikar.com/files/aup_ssw8_2013_tts.pdf text 2013 ftciteseerx 2020-04-05T00:21:13Z Abstract Many spoken languages do not have a standardized writing system. Building text to speech voices for them, without accurate transcripts of speech data is difficult. Our language independent method to bootstrap synthetic voices using only speech data relies upon cross-lingual phonetic decoding of speech. In this paper, we describe novel additions to our bootstrapping method. We present results on eight different languages---English, Dari, Pashto, Iraqi, Thai, Konkani, Inupiaq and Ojibwe, from different language families and show that our phonetic voices can be made understandable with as little as an hour of speech data that never had transcriptions, and without many resources in the target language available. We also present purely acoustic techniques that can help induce syllable and word level information that can further improve the intelligibility of these voices. Index Terms: speech synthesis, synthesis without text, languages without an orthography Introduction Recent developments in speech and language technologies have revolutionized the ways in which we access information. Advances in speech recognition, speech synthesis and dialog modeling have brought out interactive agents that people can talk to naturally and ask for information. There is a lot of interest in building such systems especially in multilingual environments. Building speech and language systems typically requires significant amounts of data and linguistic resources. For many spoken languages of the world, finding large corpora or linguistic resources is difficult. Yet, these languages have many native speakers around the world and it would be very interesting to deploy speech technologies in them. Our work is about building text-to-speech systems for languages that are purely spoken languages: they do not have a standardized writing system. These languages could be mainstream languages such as Konkani (a western Indian language with over 8 million speakers), or dialects of a major language that are phonetically quite distinct ... Text Inupiaq Unknown Indian
institution Open Polar
collection Unknown
op_collection_id ftciteseerx
language English
description Abstract Many spoken languages do not have a standardized writing system. Building text to speech voices for them, without accurate transcripts of speech data is difficult. Our language independent method to bootstrap synthetic voices using only speech data relies upon cross-lingual phonetic decoding of speech. In this paper, we describe novel additions to our bootstrapping method. We present results on eight different languages---English, Dari, Pashto, Iraqi, Thai, Konkani, Inupiaq and Ojibwe, from different language families and show that our phonetic voices can be made understandable with as little as an hour of speech data that never had transcriptions, and without many resources in the target language available. We also present purely acoustic techniques that can help induce syllable and word level information that can further improve the intelligibility of these voices. Index Terms: speech synthesis, synthesis without text, languages without an orthography Introduction Recent developments in speech and language technologies have revolutionized the ways in which we access information. Advances in speech recognition, speech synthesis and dialog modeling have brought out interactive agents that people can talk to naturally and ask for information. There is a lot of interest in building such systems especially in multilingual environments. Building speech and language systems typically requires significant amounts of data and linguistic resources. For many spoken languages of the world, finding large corpora or linguistic resources is difficult. Yet, these languages have many native speakers around the world and it would be very interesting to deploy speech technologies in them. Our work is about building text-to-speech systems for languages that are purely spoken languages: they do not have a standardized writing system. These languages could be mainstream languages such as Konkani (a western Indian language with over 8 million speakers), or dialects of a major language that are phonetically quite distinct ...
author2 The Pennsylvania State University CiteSeerX Archives
format Text
author Sunayana Sitaram
Krishna Gopala
Justin Anumanchipalli
Alok Chiu
Alan W Parlikar
Black
spellingShingle Sunayana Sitaram
Krishna Gopala
Justin Anumanchipalli
Alok Chiu
Alan W Parlikar
Black
Text to Speech in New Languages without a Standardized Orthography
author_facet Sunayana Sitaram
Krishna Gopala
Justin Anumanchipalli
Alok Chiu
Alan W Parlikar
Black
author_sort Sunayana Sitaram
title Text to Speech in New Languages without a Standardized Orthography
title_short Text to Speech in New Languages without a Standardized Orthography
title_full Text to Speech in New Languages without a Standardized Orthography
title_fullStr Text to Speech in New Languages without a Standardized Orthography
title_full_unstemmed Text to Speech in New Languages without a Standardized Orthography
title_sort text to speech in new languages without a standardized orthography
publishDate 2013
url http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1047.4122
geographic Indian
geographic_facet Indian
genre Inupiaq
genre_facet Inupiaq
op_source https://www.parlikar.com/files/aup_ssw8_2013_tts.pdf
op_relation http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1047.4122
op_rights Metadata may be used without restrictions as long as the oai identifier remains attached to it.
_version_ 1766046617776947200