Toward a Corpus of Tundra Nenets: Stages and Challenges in Building a Corpus

In this paper, we report on the main lessons drawn from the first year of a Tundra Nenets (Samoyedic, Uralic) corpus building work carried out in the Hungarian Research Institute for Linguistics. The aim of our work is twofold. First we collect, process and archive written (and in the latter part of...

Full description

Bibliographic Details
Published in:Proceedings of the Workshop on Computational Methods for Endangered Languages
Main Authors: Mus, Nikolett, Metzger, Réka
Format: Article in Journal/Newspaper
Language:English
Published: Proceedings of the Workshop on Computational Methods for Endangered Languages 2021
Subjects:
Online Access:https://journals.colorado.edu/index.php/computel/article/view/975
https://doi.org/10.33011/computel.v2i.975
Description
Summary:In this paper, we report on the main lessons drawn from the first year of a Tundra Nenets (Samoyedic, Uralic) corpus building work carried out in the Hungarian Research Institute for Linguistics. The aim of our work is twofold. First we collect, process and archive written (and in the latter part of the project period spoken) data of Tundra Nenets. Second, we build a parallel corpus, i.e. a Tundra Nenets–Russian corpus, to support and encourage preferably synchronic syntactic research on Tundra Nenets. After discussing certain language and culture specific factors that potentially influence the sampling method, we present the stages of our work in detail.