Neural models for morphological generation, analysis and lemmatization in 22 languages

Morphological models for generation, lemmatization and analysis in 22 languages. The models are trained in OpenNMT-py https://github.com/OpenNMT/OpenNMT-py. Feed one word at a time, split into characters (kissa -> k i s s a) Supported languages: German (deu), Kven (fkv), Komi-Zyrian (kpv), Mokhsa...

Full description

Bibliographic Details
Main Authors: Hämäläinen, Mika, Partanen, Niko, Rueter, Jack, Alnajjar, Khalid
Format: Dataset
Language:Finnish
Published: Zenodo 2020
Subjects:
fst
Online Access:https://dx.doi.org/10.5281/zenodo.3926769
https://zenodo.org/record/3926769
Description
Summary:Morphological models for generation, lemmatization and analysis in 22 languages. The models are trained in OpenNMT-py https://github.com/OpenNMT/OpenNMT-py. Feed one word at a time, split into characters (kissa -> k i s s a) Supported languages: German (deu), Kven (fkv), Komi-Zyrian (kpv), Mokhsa (mdf), Mansi (mns), Erzya (myv), Norwegian Bokmål (nob), Russian (rus), South Sami (sma), Lule Sami (smj), Skolt Sami (sms), Võro (vro), Finnish (fin), Komi-Permyak (koi), Latvian (lav), Eastern Mari (mhr), Western Mari (mrj), Namonuito (nmt), Olonets-Karelian (olo), Pite Sami (sje), Northern Sami (sme), Inari Sami (smn) and Udmurt (udm) Cite: Hämäläinen, M., Partanen, N., Rueter, J., & Alnajjar, K. (2021). Neural Morphology Dataset and Models for Multiple Languages, from the Large to the Endangered. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021)