Morphological Disambiguation of South Sámi with FSTs and Neural Networks

We present a method for conducting morphological disambiguation for South Sámi, which is an endangered language. Our method uses an FST-based morphological analyzer to produce an ambiguous set of morphological readings for each word in a sentence. These readings are disambiguated with a Bi-RNN model...

Full description

Bibliographic Details
Main Authors: Hämäläinen, Mika, Wiechetek, Linda
Format: Article in Journal/Newspaper
Language:unknown
Published: arXiv 2020
Subjects:
Online Access:https://dx.doi.org/10.48550/arxiv.2004.14062
https://arxiv.org/abs/2004.14062
id ftdatacite:10.48550/arxiv.2004.14062
record_format openpolar
spelling ftdatacite:10.48550/arxiv.2004.14062 2023-05-15T17:40:07+02:00 Morphological Disambiguation of South Sámi with FSTs and Neural Networks Hämäläinen, Mika Wiechetek, Linda 2020 https://dx.doi.org/10.48550/arxiv.2004.14062 https://arxiv.org/abs/2004.14062 unknown arXiv Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 CC-BY Computation and Language cs.CL FOS Computer and information sciences Article CreativeWork article Preprint 2020 ftdatacite https://doi.org/10.48550/arxiv.2004.14062 2022-03-10T16:15:01Z We present a method for conducting morphological disambiguation for South Sámi, which is an endangered language. Our method uses an FST-based morphological analyzer to produce an ambiguous set of morphological readings for each word in a sentence. These readings are disambiguated with a Bi-RNN model trained on the related North Sámi UD Treebank and some synthetically generated South Sámi data. The disambiguation is done on the level of morphological tags ignoring word forms and lemmas; this makes it possible to use North Sámi training data for South Sámi without the need for a bilingual dictionary or aligned word embeddings. Our approach requires only minimal resources for South Sámi, which makes it usable and applicable in the contexts of any other endangered language as well. : 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020) Article in Journal/Newspaper North Sámi Sámi South Sámi DataCite Metadata Store (German National Library of Science and Technology)
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language unknown
topic Computation and Language cs.CL
FOS Computer and information sciences
spellingShingle Computation and Language cs.CL
FOS Computer and information sciences
Hämäläinen, Mika
Wiechetek, Linda
Morphological Disambiguation of South Sámi with FSTs and Neural Networks
topic_facet Computation and Language cs.CL
FOS Computer and information sciences
description We present a method for conducting morphological disambiguation for South Sámi, which is an endangered language. Our method uses an FST-based morphological analyzer to produce an ambiguous set of morphological readings for each word in a sentence. These readings are disambiguated with a Bi-RNN model trained on the related North Sámi UD Treebank and some synthetically generated South Sámi data. The disambiguation is done on the level of morphological tags ignoring word forms and lemmas; this makes it possible to use North Sámi training data for South Sámi without the need for a bilingual dictionary or aligned word embeddings. Our approach requires only minimal resources for South Sámi, which makes it usable and applicable in the contexts of any other endangered language as well. : 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020)
format Article in Journal/Newspaper
author Hämäläinen, Mika
Wiechetek, Linda
author_facet Hämäläinen, Mika
Wiechetek, Linda
author_sort Hämäläinen, Mika
title Morphological Disambiguation of South Sámi with FSTs and Neural Networks
title_short Morphological Disambiguation of South Sámi with FSTs and Neural Networks
title_full Morphological Disambiguation of South Sámi with FSTs and Neural Networks
title_fullStr Morphological Disambiguation of South Sámi with FSTs and Neural Networks
title_full_unstemmed Morphological Disambiguation of South Sámi with FSTs and Neural Networks
title_sort morphological disambiguation of south sámi with fsts and neural networks
publisher arXiv
publishDate 2020
url https://dx.doi.org/10.48550/arxiv.2004.14062
https://arxiv.org/abs/2004.14062
genre North Sámi
Sámi
South Sámi
genre_facet North Sámi
Sámi
South Sámi
op_rights Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
cc-by-4.0
op_rightsnorm CC-BY
op_doi https://doi.org/10.48550/arxiv.2004.14062
_version_ 1766140927181586432