Finding Sami Cognates with a Character-Based NMT Approach

We approach the problem of expanding the set of cognate relations with a sequence-to-sequence NMT model. The language pair of interest, Skolt Sami and North Sami, has too limited a set of parallel data for an NMT model as such. We solve this problem on the one hand, by training the model with North...

Full description

Bibliographic Details
Main Authors: Hämäläinen, Mika, Rueter, Jack
Format: Text
Language:unknown
Published: CU Scholar 2019
Subjects:
Online Access:https://scholar.colorado.edu/scil-cmel/vol1/iss1/6
https://scholar.colorado.edu/cgi/viewcontent.cgi?article=1010&context=scil-cmel
id ftunicolboulder:oai:scholar.colorado.edu:scil-cmel-1010
record_format openpolar
spelling ftunicolboulder:oai:scholar.colorado.edu:scil-cmel-1010 2023-05-15T18:10:20+02:00 Finding Sami Cognates with a Character-Based NMT Approach Hämäläinen, Mika Rueter, Jack 2019-02-26T08:00:00Z application/pdf https://scholar.colorado.edu/scil-cmel/vol1/iss1/6 https://scholar.colorado.edu/cgi/viewcontent.cgi?article=1010&context=scil-cmel unknown CU Scholar https://scholar.colorado.edu/scil-cmel/vol1/iss1/6 https://scholar.colorado.edu/cgi/viewcontent.cgi?article=1010&context=scil-cmel Proceedings of the Workshop on Computational Methods for Endangered Languages Computational Linguistics Linguistics text 2019 ftunicolboulder 2019-03-02T00:41:42Z We approach the problem of expanding the set of cognate relations with a sequence-to-sequence NMT model. The language pair of interest, Skolt Sami and North Sami, has too limited a set of parallel data for an NMT model as such. We solve this problem on the one hand, by training the model with North Sami cognates with other Uralic languages and, on the other, by generating more synthetic training data with an SMT model. The cognates found using our method are made publicly available in the Online Dictionary of Uralic Languages. Text sami University of Colorado, Boulder: CU Scholar
institution Open Polar
collection University of Colorado, Boulder: CU Scholar
op_collection_id ftunicolboulder
language unknown
topic Computational Linguistics
Linguistics
spellingShingle Computational Linguistics
Linguistics
Hämäläinen, Mika
Rueter, Jack
Finding Sami Cognates with a Character-Based NMT Approach
topic_facet Computational Linguistics
Linguistics
description We approach the problem of expanding the set of cognate relations with a sequence-to-sequence NMT model. The language pair of interest, Skolt Sami and North Sami, has too limited a set of parallel data for an NMT model as such. We solve this problem on the one hand, by training the model with North Sami cognates with other Uralic languages and, on the other, by generating more synthetic training data with an SMT model. The cognates found using our method are made publicly available in the Online Dictionary of Uralic Languages.
format Text
author Hämäläinen, Mika
Rueter, Jack
author_facet Hämäläinen, Mika
Rueter, Jack
author_sort Hämäläinen, Mika
title Finding Sami Cognates with a Character-Based NMT Approach
title_short Finding Sami Cognates with a Character-Based NMT Approach
title_full Finding Sami Cognates with a Character-Based NMT Approach
title_fullStr Finding Sami Cognates with a Character-Based NMT Approach
title_full_unstemmed Finding Sami Cognates with a Character-Based NMT Approach
title_sort finding sami cognates with a character-based nmt approach
publisher CU Scholar
publishDate 2019
url https://scholar.colorado.edu/scil-cmel/vol1/iss1/6
https://scholar.colorado.edu/cgi/viewcontent.cgi?article=1010&context=scil-cmel
genre sami
genre_facet sami
op_source Proceedings of the Workshop on Computational Methods for Endangered Languages
op_relation https://scholar.colorado.edu/scil-cmel/vol1/iss1/6
https://scholar.colorado.edu/cgi/viewcontent.cgi?article=1010&context=scil-cmel
_version_ 1766183147336105984