Finding Sami Cognates with a Character-Based NMT Approach

We approach the problem of expanding the set of cognate relations with a sequence-to-sequence NMT model. The language pair of interest, Skolt Sami and North Sami, has too limited a set of parallel data for an NMT model as such. We solve this problem on the one hand, by training the model with North...

Full description

Bibliographic Details
Published in:Proceedings of the Workshop on Computational Methods for Endangered Languages
Main Authors: Hämäläinen, Mika, Reuter, Jack
Format: Article in Journal/Newspaper
Language:English
Published: Proceedings of the Workshop on Computational Methods for Endangered Languages 2019
Subjects:
Online Access:https://journals.colorado.edu/index.php/computel/article/view/395
https://doi.org/10.33011/computel.v1i.395
Description
Summary:We approach the problem of expanding the set of cognate relations with a sequence-to-sequence NMT model. The language pair of interest, Skolt Sami and North Sami, has too limited a set of parallel data for an NMT model as such. We solve this problem on the one hand, by training the model with North Sami cognates with other Uralic languages and, on the other, by generating more synthetic training data with an SMT model. The cognates found using our method are made publicly available in the Online Dictionary of Uralic Languages.