Finding Sami Cognates with a Character-Based NMT Approach
We approach the problem of expanding the set of cognate relations with a sequence-to-sequence NMT model. The language pair of interest, Skolt Sami and North Sami, has too limited a set of parallel data for an NMT model as such. We solve this problem on the one hand, by training the model with North...
Main Authors: | , |
---|---|
Format: | Text |
Language: | unknown |
Published: |
CU Scholar
2019
|
Subjects: | |
Online Access: | https://scholar.colorado.edu/scil-cmel/vol1/iss1/6 https://scholar.colorado.edu/cgi/viewcontent.cgi?article=1010&context=scil-cmel |
id |
ftunicolboulder:oai:scholar.colorado.edu:scil-cmel-1010 |
---|---|
record_format |
openpolar |
spelling |
ftunicolboulder:oai:scholar.colorado.edu:scil-cmel-1010 2023-05-15T18:10:20+02:00 Finding Sami Cognates with a Character-Based NMT Approach Hämäläinen, Mika Rueter, Jack 2019-02-26T08:00:00Z application/pdf https://scholar.colorado.edu/scil-cmel/vol1/iss1/6 https://scholar.colorado.edu/cgi/viewcontent.cgi?article=1010&context=scil-cmel unknown CU Scholar https://scholar.colorado.edu/scil-cmel/vol1/iss1/6 https://scholar.colorado.edu/cgi/viewcontent.cgi?article=1010&context=scil-cmel Proceedings of the Workshop on Computational Methods for Endangered Languages Computational Linguistics Linguistics text 2019 ftunicolboulder 2019-03-02T00:41:42Z We approach the problem of expanding the set of cognate relations with a sequence-to-sequence NMT model. The language pair of interest, Skolt Sami and North Sami, has too limited a set of parallel data for an NMT model as such. We solve this problem on the one hand, by training the model with North Sami cognates with other Uralic languages and, on the other, by generating more synthetic training data with an SMT model. The cognates found using our method are made publicly available in the Online Dictionary of Uralic Languages. Text sami University of Colorado, Boulder: CU Scholar |
institution |
Open Polar |
collection |
University of Colorado, Boulder: CU Scholar |
op_collection_id |
ftunicolboulder |
language |
unknown |
topic |
Computational Linguistics Linguistics |
spellingShingle |
Computational Linguistics Linguistics Hämäläinen, Mika Rueter, Jack Finding Sami Cognates with a Character-Based NMT Approach |
topic_facet |
Computational Linguistics Linguistics |
description |
We approach the problem of expanding the set of cognate relations with a sequence-to-sequence NMT model. The language pair of interest, Skolt Sami and North Sami, has too limited a set of parallel data for an NMT model as such. We solve this problem on the one hand, by training the model with North Sami cognates with other Uralic languages and, on the other, by generating more synthetic training data with an SMT model. The cognates found using our method are made publicly available in the Online Dictionary of Uralic Languages. |
format |
Text |
author |
Hämäläinen, Mika Rueter, Jack |
author_facet |
Hämäläinen, Mika Rueter, Jack |
author_sort |
Hämäläinen, Mika |
title |
Finding Sami Cognates with a Character-Based NMT Approach |
title_short |
Finding Sami Cognates with a Character-Based NMT Approach |
title_full |
Finding Sami Cognates with a Character-Based NMT Approach |
title_fullStr |
Finding Sami Cognates with a Character-Based NMT Approach |
title_full_unstemmed |
Finding Sami Cognates with a Character-Based NMT Approach |
title_sort |
finding sami cognates with a character-based nmt approach |
publisher |
CU Scholar |
publishDate |
2019 |
url |
https://scholar.colorado.edu/scil-cmel/vol1/iss1/6 https://scholar.colorado.edu/cgi/viewcontent.cgi?article=1010&context=scil-cmel |
genre |
sami |
genre_facet |
sami |
op_source |
Proceedings of the Workshop on Computational Methods for Endangered Languages |
op_relation |
https://scholar.colorado.edu/scil-cmel/vol1/iss1/6 https://scholar.colorado.edu/cgi/viewcontent.cgi?article=1010&context=scil-cmel |
_version_ |
1766183147336105984 |