North Sámi morphological segmentation with low-resource semi-supervised sequence labeling
| openaire: EC/H2020/780069/EU//MeMAD Semi-supervised sequence labeling is an effective way to train a low-resource morphological segmentation system. We show that a feature set augmentation approach, which combines the strengths of generative and discriminative mod- els, is suitable both for graphi...
Main Authors: | , , |
---|---|
Other Authors: | , , , |
Format: | Other/Unknown Material |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | https://aaltodoc.aalto.fi/handle/123456789/40463 |
Summary: | | openaire: EC/H2020/780069/EU//MeMAD Semi-supervised sequence labeling is an effective way to train a low-resource morphological segmentation system. We show that a feature set augmentation approach, which combines the strengths of generative and discriminative mod- els, is suitable both for graphical models like conditional random field (CRF) and sequence-to-sequence neural models. We perform a comparative evaluation be- tween three existing and one novel semi-supervised segmentation methods. All four systems are language-independent and have open-source implementations. We improve on previous best results for North Sámi morphological segmentation. We see a relative improvement in morph boundary F 1 -score of 8.6% compared to using the generative Morfessor FlatCat model directly and 2.4% compared to a seq2seq baseline. Our neural sequence tagging system reaches almost the same performance as the CRF topline. Peer reviewed |
---|