Text to Speech in New Languages without a Standardized Orthography
Abstract Many spoken languages do not have a standardized writing system. Building text to speech voices for them, without accurate transcripts of speech data is difficult. Our language independent method to bootstrap synthetic voices using only speech data relies upon cross-lingual phonetic decodin...
Main Authors: | , , , , , |
---|---|
Other Authors: | |
Format: | Text |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1047.4122 |
id |
ftciteseerx:oai:CiteSeerX.psu:10.1.1.1047.4122 |
---|---|
record_format |
openpolar |
spelling |
ftciteseerx:oai:CiteSeerX.psu:10.1.1.1047.4122 2023-05-15T16:55:38+02:00 Text to Speech in New Languages without a Standardized Orthography Sunayana Sitaram Krishna Gopala Justin Anumanchipalli Alok Chiu Alan W Parlikar Black The Pennsylvania State University CiteSeerX Archives 2013 application/pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1047.4122 en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1047.4122 Metadata may be used without restrictions as long as the oai identifier remains attached to it. https://www.parlikar.com/files/aup_ssw8_2013_tts.pdf text 2013 ftciteseerx 2020-04-05T00:21:13Z Abstract Many spoken languages do not have a standardized writing system. Building text to speech voices for them, without accurate transcripts of speech data is difficult. Our language independent method to bootstrap synthetic voices using only speech data relies upon cross-lingual phonetic decoding of speech. In this paper, we describe novel additions to our bootstrapping method. We present results on eight different languages---English, Dari, Pashto, Iraqi, Thai, Konkani, Inupiaq and Ojibwe, from different language families and show that our phonetic voices can be made understandable with as little as an hour of speech data that never had transcriptions, and without many resources in the target language available. We also present purely acoustic techniques that can help induce syllable and word level information that can further improve the intelligibility of these voices. Index Terms: speech synthesis, synthesis without text, languages without an orthography Introduction Recent developments in speech and language technologies have revolutionized the ways in which we access information. Advances in speech recognition, speech synthesis and dialog modeling have brought out interactive agents that people can talk to naturally and ask for information. There is a lot of interest in building such systems especially in multilingual environments. Building speech and language systems typically requires significant amounts of data and linguistic resources. For many spoken languages of the world, finding large corpora or linguistic resources is difficult. Yet, these languages have many native speakers around the world and it would be very interesting to deploy speech technologies in them. Our work is about building text-to-speech systems for languages that are purely spoken languages: they do not have a standardized writing system. These languages could be mainstream languages such as Konkani (a western Indian language with over 8 million speakers), or dialects of a major language that are phonetically quite distinct ... Text Inupiaq Unknown Indian |
institution |
Open Polar |
collection |
Unknown |
op_collection_id |
ftciteseerx |
language |
English |
description |
Abstract Many spoken languages do not have a standardized writing system. Building text to speech voices for them, without accurate transcripts of speech data is difficult. Our language independent method to bootstrap synthetic voices using only speech data relies upon cross-lingual phonetic decoding of speech. In this paper, we describe novel additions to our bootstrapping method. We present results on eight different languages---English, Dari, Pashto, Iraqi, Thai, Konkani, Inupiaq and Ojibwe, from different language families and show that our phonetic voices can be made understandable with as little as an hour of speech data that never had transcriptions, and without many resources in the target language available. We also present purely acoustic techniques that can help induce syllable and word level information that can further improve the intelligibility of these voices. Index Terms: speech synthesis, synthesis without text, languages without an orthography Introduction Recent developments in speech and language technologies have revolutionized the ways in which we access information. Advances in speech recognition, speech synthesis and dialog modeling have brought out interactive agents that people can talk to naturally and ask for information. There is a lot of interest in building such systems especially in multilingual environments. Building speech and language systems typically requires significant amounts of data and linguistic resources. For many spoken languages of the world, finding large corpora or linguistic resources is difficult. Yet, these languages have many native speakers around the world and it would be very interesting to deploy speech technologies in them. Our work is about building text-to-speech systems for languages that are purely spoken languages: they do not have a standardized writing system. These languages could be mainstream languages such as Konkani (a western Indian language with over 8 million speakers), or dialects of a major language that are phonetically quite distinct ... |
author2 |
The Pennsylvania State University CiteSeerX Archives |
format |
Text |
author |
Sunayana Sitaram Krishna Gopala Justin Anumanchipalli Alok Chiu Alan W Parlikar Black |
spellingShingle |
Sunayana Sitaram Krishna Gopala Justin Anumanchipalli Alok Chiu Alan W Parlikar Black Text to Speech in New Languages without a Standardized Orthography |
author_facet |
Sunayana Sitaram Krishna Gopala Justin Anumanchipalli Alok Chiu Alan W Parlikar Black |
author_sort |
Sunayana Sitaram |
title |
Text to Speech in New Languages without a Standardized Orthography |
title_short |
Text to Speech in New Languages without a Standardized Orthography |
title_full |
Text to Speech in New Languages without a Standardized Orthography |
title_fullStr |
Text to Speech in New Languages without a Standardized Orthography |
title_full_unstemmed |
Text to Speech in New Languages without a Standardized Orthography |
title_sort |
text to speech in new languages without a standardized orthography |
publishDate |
2013 |
url |
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1047.4122 |
geographic |
Indian |
geographic_facet |
Indian |
genre |
Inupiaq |
genre_facet |
Inupiaq |
op_source |
https://www.parlikar.com/files/aup_ssw8_2013_tts.pdf |
op_relation |
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1047.4122 |
op_rights |
Metadata may be used without restrictions as long as the oai identifier remains attached to it. |
_version_ |
1766046617776947200 |