Speech Technology for Minority Languages: the Case of Irish (Gaelic)
PUBLISHED Pittsburgh Abstract?Unit selection is a data-driven approach to speech synthesis that concatenates pieces of recorded speech from a large database in order to create novel sentences. Many corpora are available in the English language, including the Arctic database [1], which allows a user...
Main Authors: | , |
---|---|
Format: | Conference Object |
Language: | English |
Published: |
2006
|
Subjects: | |
Online Access: | http://hdl.handle.net/2262/39404 http://people.tcd.ie/anichsid http://people.tcd.ie/cegobl http://tcd.academia.edu/documents/0027/8616/corpus.pdf |
Summary: | PUBLISHED Pittsburgh Abstract?Unit selection is a data-driven approach to speech synthesis that concatenates pieces of recorded speech from a large database in order to create novel sentences. Many corpora are available in the English language, including the Arctic database [1], which allows a user to create small, reliable speech synthesisers using only a small set of recorded sentences. Such resources for minority languages are scarce however, despite their increasing importance for the survival of such languages. This paper describes the current research in creating efficient Irish language corpora for speech synthesis. Corpus design techniques are discussed, in particular, two methods of data reduction that are applied to an aligned spoken corpus of Irish in order to create smaller, more efficient speech corpora. The CAB ' OGA'I II project is funded by Foras na Gaeilge. |
---|