Speech Technology for Minority Languages: the Case of Irish (Gaelic)

PUBLISHED Pittsburgh Abstract?Unit selection is a data-driven approach to speech synthesis that concatenates pieces of recorded speech from a large database in order to create novel sentences. Many corpora are available in the English language, including the Arctic database [1], which allows a user...

Full description

Bibliographic Details
Main Authors: NI CHASAIDE, AILBHE, GOBL, CHRISTER
Format: Conference Object
Language:English
Published: 2006
Subjects:
Online Access:http://hdl.handle.net/2262/39404
http://people.tcd.ie/anichsid
http://people.tcd.ie/cegobl
http://tcd.academia.edu/documents/0027/8616/corpus.pdf
Description
Summary:PUBLISHED Pittsburgh Abstract?Unit selection is a data-driven approach to speech synthesis that concatenates pieces of recorded speech from a large database in order to create novel sentences. Many corpora are available in the English language, including the Arctic database [1], which allows a user to create small, reliable speech synthesisers using only a small set of recorded sentences. Such resources for minority languages are scarce however, despite their increasing importance for the survival of such languages. This paper describes the current research in creating efficient Irish language corpora for speech synthesis. Corpus design techniques are discussed, in particular, two methods of data reduction that are applied to an aligned spoken corpus of Irish in order to create smaller, more efficient speech corpora. The CAB ' OGA'I II project is funded by Foras na Gaeilge.