North Sámi morphological segmentation with low-resource semi-supervised sequence labeling

| openaire: EC/H2020/780069/EU//MeMAD Semi-supervised sequence labeling is an effective way to train a low-resource morphological segmentation system. We show that a feature set augmentation approach, which combines the strengths of generative and discriminative mod- els, is suitable both for graphi...

Full description

Bibliographic Details
Main Authors: Grönroos, Stig-Arne, Virpioja, Sami, Kurimo, Mikko
Other Authors: Centre of Excellence in Computational Inference, COIN, Dept Signal Process and Acoust, Aalto-yliopisto, Aalto University
Format: Other/Unknown Material
Language:English
Published: 2019
Subjects:
Online Access:https://aaltodoc.aalto.fi/handle/123456789/40463
Description
Summary:| openaire: EC/H2020/780069/EU//MeMAD Semi-supervised sequence labeling is an effective way to train a low-resource morphological segmentation system. We show that a feature set augmentation approach, which combines the strengths of generative and discriminative mod- els, is suitable both for graphical models like conditional random field (CRF) and sequence-to-sequence neural models. We perform a comparative evaluation be- tween three existing and one novel semi-supervised segmentation methods. All four systems are language-independent and have open-source implementations. We improve on previous best results for North Sámi morphological segmentation. We see a relative improvement in morph boundary F 1 -score of 8.6% compared to using the generative Morfessor FlatCat model directly and 2.4% compared to a seq2seq baseline. Our neural sequence tagging system reaches almost the same performance as the CRF topline. Peer reviewed