Summary: | This dissertation develops language technology tools for low-resource languages. It is important to ensure that low-resource languages are not left behind in the rapidly evolving digital landscape, as language technology tools can greatly improve communication and information access for speakers of these languages. The support of low-resource languages through technology development and revitalisation efforts is essential for preserving linguistic diversity and maintaining the richness of cultural heritage. The dissertation presents five case studies for three languages, starting from the truly low-resource Sakha language to the more resourceful languages, Finnish and Norwegian, which still lack many resources available for English. Sakha is a Turkic language spoken in the Republic of Sakha in Siberia by 0.5 million people. Finnish is a Uralic language of the Finnic branch, spoken by 5.8 million people in Finland and by ethnic Finns outside of Finland. Norwegian is a North Germanic language, spoken mainly in Norway by 5.32 million people. The five cases covered in the dissertation range from essential tools for Sakha, such as a morphological analyser, to higher-level tools for Norwegian and Finnish. The contributions of the dissertation are as follows. We developed a morphological analyser and generator for Sakha within the framework of two-level morphology. It has a coverage of above 90\% and 99\% precision. While developing the analyser, we expanded linguistic knowledge about Sakha and devised strategies for complex grammatical patterns. We implemented a language-learning environment for Sakha in the Revita computer-assisted language-learning platform, using the morphological analyser we developed. We created a Turkic Interlingua corpus and trained Russian-Sakha, Sakha-Russian, English-Sakha, and Sakha-English machine translation models, as well as a multi-way neural machine translation model. We performed an extensive analysis using automatic metrics as well as human evaluations. We created NorQuAD---the ...
|