Text to Speech in New Languages without a Standardized Orthography

Abstract Many spoken languages do not have a standardized writing system. Building text to speech voices for them, without accurate transcripts of speech data is difficult. Our language independent method to bootstrap synthetic voices using only speech data relies upon cross-lingual phonetic decodin...

Full description

Bibliographic Details
Main Authors:	Sunayana Sitaram, Krishna Gopala, Justin Anumanchipalli, Alok Chiu, Alan W Parlikar, Black
Other Authors:	The Pennsylvania State University CiteSeerX Archives
Format:	Text
Language:	English
Published:	2013
Subjects:	Indian Inupiaq
Online Access:	http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1047.4122

id	ftciteseerx:oai:CiteSeerX.psu:10.1.1.1047.4122
record_format	openpolar
spelling	ftciteseerx:oai:CiteSeerX.psu:10.1.1.1047.4122 2023-05-15T16:55:38+02:00 Text to Speech in New Languages without a Standardized Orthography Sunayana Sitaram Krishna Gopala Justin Anumanchipalli Alok Chiu Alan W Parlikar Black The Pennsylvania State University CiteSeerX Archives 2013 application/pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1047.4122 en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1047.4122 Metadata may be used without restrictions as long as the oai identifier remains attached to it. https://www.parlikar.com/files/aup_ssw8_2013_tts.pdf text 2013 ftciteseerx 2020-04-05T00:21:13Z Abstract Many spoken languages do not have a standardized writing system. Building text to speech voices for them, without accurate transcripts of speech data is difficult. Our language independent method to bootstrap synthetic voices using only speech data relies upon cross-lingual phonetic decoding of speech. In this paper, we describe novel additions to our bootstrapping method. We present results on eight different languages---English, Dari, Pashto, Iraqi, Thai, Konkani, Inupiaq and Ojibwe, from different language families and show that our phonetic voices can be made understandable with as little as an hour of speech data that never had transcriptions, and without many resources in the target language available. We also present purely acoustic techniques that can help induce syllable and word level information that can further improve the intelligibility of these voices. Index Terms: speech synthesis, synthesis without text, languages without an orthography Introduction Recent developments in speech and language technologies have revolutionized the ways in which we access information. Advances in speech recognition, speech synthesis and dialog modeling have brought out interactive agents that people can talk to naturally and ask for information. There is a lot of interest in building such systems especially in multilingual environments. Building speech and language systems typically requires significant amounts of data and linguistic resources. For many spoken languages of the world, finding large corpora or linguistic resources is difficult. Yet, these languages have many native speakers around the world and it would be very interesting to deploy speech technologies in them. Our work is about building text-to-speech systems for languages that are purely spoken languages: they do not have a standardized writing system. These languages could be mainstream languages such as Konkani (a western Indian language with over 8 million speakers), or dialects of a major language that are phonetically quite distinct ... Text Inupiaq Unknown Indian
institution	Open Polar
collection	Unknown
op_collection_id	ftciteseerx
language	English
description	Abstract Many spoken languages do not have a standardized writing system. Building text to speech voices for them, without accurate transcripts of speech data is difficult. Our language independent method to bootstrap synthetic voices using only speech data relies upon cross-lingual phonetic decoding of speech. In this paper, we describe novel additions to our bootstrapping method. We present results on eight different languages---English, Dari, Pashto, Iraqi, Thai, Konkani, Inupiaq and Ojibwe, from different language families and show that our phonetic voices can be made understandable with as little as an hour of speech data that never had transcriptions, and without many resources in the target language available. We also present purely acoustic techniques that can help induce syllable and word level information that can further improve the intelligibility of these voices. Index Terms: speech synthesis, synthesis without text, languages without an orthography Introduction Recent developments in speech and language technologies have revolutionized the ways in which we access information. Advances in speech recognition, speech synthesis and dialog modeling have brought out interactive agents that people can talk to naturally and ask for information. There is a lot of interest in building such systems especially in multilingual environments. Building speech and language systems typically requires significant amounts of data and linguistic resources. For many spoken languages of the world, finding large corpora or linguistic resources is difficult. Yet, these languages have many native speakers around the world and it would be very interesting to deploy speech technologies in them. Our work is about building text-to-speech systems for languages that are purely spoken languages: they do not have a standardized writing system. These languages could be mainstream languages such as Konkani (a western Indian language with over 8 million speakers), or dialects of a major language that are phonetically quite distinct ...
author2	The Pennsylvania State University CiteSeerX Archives
format	Text
author	Sunayana Sitaram Krishna Gopala Justin Anumanchipalli Alok Chiu Alan W Parlikar Black
spellingShingle	Sunayana Sitaram Krishna Gopala Justin Anumanchipalli Alok Chiu Alan W Parlikar Black Text to Speech in New Languages without a Standardized Orthography
author_facet	Sunayana Sitaram Krishna Gopala Justin Anumanchipalli Alok Chiu Alan W Parlikar Black
author_sort	Sunayana Sitaram
title	Text to Speech in New Languages without a Standardized Orthography
title_short	Text to Speech in New Languages without a Standardized Orthography
title_full	Text to Speech in New Languages without a Standardized Orthography
title_fullStr	Text to Speech in New Languages without a Standardized Orthography
title_full_unstemmed	Text to Speech in New Languages without a Standardized Orthography
title_sort	text to speech in new languages without a standardized orthography
publishDate	2013
url	http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1047.4122
geographic	Indian
geographic_facet	Indian
genre	Inupiaq
genre_facet	Inupiaq
op_source	https://www.parlikar.com/files/aup_ssw8_2013_tts.pdf
op_relation	http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1047.4122
op_rights	Metadata may be used without restrictions as long as the oai identifier remains attached to it.
_version_	1766046617776947200

Text to Speech in New Languages without a Standardized Orthography

Similar Items