Finding Sami Cognates with a Character-Based NMT Approach

We approach the problem of expanding the set of cognate relations with a sequence-to-sequence NMT model. The language pair of interest, Skolt Sami and North Sami, has too limited a set of parallel data for an NMT model as such. We solve this problem on the one hand, by training the model with North...

Full description

Bibliographic Details
Published in:Proceedings of the Workshop on Computational Methods for Endangered Languages
Main Authors: Hämäläinen, Mika, Reuter, Jack
Format: Article in Journal/Newspaper
Language:English
Published: Proceedings of the Workshop on Computational Methods for Endangered Languages 2019
Subjects:
Online Access:https://journals.colorado.edu/index.php/computel/article/view/395
https://doi.org/10.33011/computel.v1i.395
id ftucoloradobould:oai:journals.colorado.edu:article/395
record_format openpolar
spelling ftucoloradobould:oai:journals.colorado.edu:article/395 2023-05-15T18:10:18+02:00 Finding Sami Cognates with a Character-Based NMT Approach Hämäläinen, Mika Reuter, Jack 2019-02-26 application/pdf https://journals.colorado.edu/index.php/computel/article/view/395 https://doi.org/10.33011/computel.v1i.395 eng eng Proceedings of the Workshop on Computational Methods for Endangered Languages https://journals.colorado.edu/index.php/computel/article/view/395/375 https://journals.colorado.edu/index.php/computel/article/view/395 doi:10.33011/computel.v1i.395 Proceedings of the Workshop on Computational Methods for Endangered Languages; Vol. 1 (2019): Proceedings of the 3rd Workshop on Computational Methods for Endangered Languages info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion Paper 2019 ftucoloradobould https://doi.org/10.33011/computel.v1i.395 2022-10-18T09:18:49Z We approach the problem of expanding the set of cognate relations with a sequence-to-sequence NMT model. The language pair of interest, Skolt Sami and North Sami, has too limited a set of parallel data for an NMT model as such. We solve this problem on the one hand, by training the model with North Sami cognates with other Uralic languages and, on the other, by generating more synthetic training data with an SMT model. The cognates found using our method are made publicly available in the Online Dictionary of Uralic Languages. Article in Journal/Newspaper sami University of Colorado Boulder Open Journals Proceedings of the Workshop on Computational Methods for Endangered Languages
institution Open Polar
collection University of Colorado Boulder Open Journals
op_collection_id ftucoloradobould
language English
description We approach the problem of expanding the set of cognate relations with a sequence-to-sequence NMT model. The language pair of interest, Skolt Sami and North Sami, has too limited a set of parallel data for an NMT model as such. We solve this problem on the one hand, by training the model with North Sami cognates with other Uralic languages and, on the other, by generating more synthetic training data with an SMT model. The cognates found using our method are made publicly available in the Online Dictionary of Uralic Languages.
format Article in Journal/Newspaper
author Hämäläinen, Mika
Reuter, Jack
spellingShingle Hämäläinen, Mika
Reuter, Jack
Finding Sami Cognates with a Character-Based NMT Approach
author_facet Hämäläinen, Mika
Reuter, Jack
author_sort Hämäläinen, Mika
title Finding Sami Cognates with a Character-Based NMT Approach
title_short Finding Sami Cognates with a Character-Based NMT Approach
title_full Finding Sami Cognates with a Character-Based NMT Approach
title_fullStr Finding Sami Cognates with a Character-Based NMT Approach
title_full_unstemmed Finding Sami Cognates with a Character-Based NMT Approach
title_sort finding sami cognates with a character-based nmt approach
publisher Proceedings of the Workshop on Computational Methods for Endangered Languages
publishDate 2019
url https://journals.colorado.edu/index.php/computel/article/view/395
https://doi.org/10.33011/computel.v1i.395
genre sami
genre_facet sami
op_source Proceedings of the Workshop on Computational Methods for Endangered Languages; Vol. 1 (2019): Proceedings of the 3rd Workshop on Computational Methods for Endangered Languages
op_relation https://journals.colorado.edu/index.php/computel/article/view/395/375
https://journals.colorado.edu/index.php/computel/article/view/395
doi:10.33011/computel.v1i.395
op_doi https://doi.org/10.33011/computel.v1i.395
container_title Proceedings of the Workshop on Computational Methods for Endangered Languages
_version_ 1766183087660597248