North Sámi morphological segmentation with low-resource semi-supervised sequence labeling
| openaire: EC/H2020/780069/EU//MeMAD Semi-supervised sequence labeling is an effective way to train a low-resource morphological segmentation system. We show that a feature set augmentation approach, which combines the strengths of generative and discriminative mod- els, is suitable both for graphi...
Main Authors: | , , |
---|---|
Other Authors: | , , , |
Format: | Other/Unknown Material |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | https://aaltodoc.aalto.fi/handle/123456789/40463 |
id |
ftaaltouniv:oai:aaltodoc.aalto.fi:123456789/40463 |
---|---|
record_format |
openpolar |
spelling |
ftaaltouniv:oai:aaltodoc.aalto.fi:123456789/40463 2023-05-15T17:40:07+02:00 North Sámi morphological segmentation with low-resource semi-supervised sequence labeling Grönroos, Stig-Arne Virpioja, Sami Kurimo, Mikko Centre of Excellence in Computational Inference, COIN Dept Signal Process and Acoust Aalto-yliopisto Aalto University 2019-01-07 application/pdf https://aaltodoc.aalto.fi/handle/123456789/40463 en eng info:eu-repo/grantAgreement/EC/H2020/780069/EU//MeMAD International Workshop on Computational Linguistics for Uralic Languages Fifth Workshop on Computational Linguistics for Uralic Languages Grönroos , S-A , Virpioja , S & Kurimo , M 2019 , North Sámi morphological segmentation with low-resource semi-supervised sequence labeling . in Fifth Workshop on Computational Linguistics for Uralic Languages : Proceedings of the Workshop . Association for Computational Linguistics , pp. 15-26 , International Workshop on Computational Linguistics for Uralic Languages , Tartu , Estonia , 07/01/2019 . < https://www.aclweb.org/anthology/W19-0302/ > 978-1-948087-92-6 PURE UUID: 832e50de-ac02-4e45-9a9e-af08e5049c1c PURE ITEMURL: https://research.aalto.fi/en/publications/832e50de-ac02-4e45-9a9e-af08e5049c1c PURE LINK: https://www.aclweb.org/anthology/W19-0302/ PURE FILEURL: https://research.aalto.fi/files/36793748/2019_iwclul.published.pdf https://aaltodoc.aalto.fi/handle/123456789/40463 URN:NBN:fi:aalto-201909255484 openAccess morphology segmentation low-resource settings semi-supervised learning sequence labeling recurrent neural networks conditional random fields north sami A4 Artikkeli konferenssijulkaisussa publishedVersion 2019 ftaaltouniv 2023-01-25T23:57:39Z | openaire: EC/H2020/780069/EU//MeMAD Semi-supervised sequence labeling is an effective way to train a low-resource morphological segmentation system. We show that a feature set augmentation approach, which combines the strengths of generative and discriminative mod- els, is suitable both for graphical models like conditional random field (CRF) and sequence-to-sequence neural models. We perform a comparative evaluation be- tween three existing and one novel semi-supervised segmentation methods. All four systems are language-independent and have open-source implementations. We improve on previous best results for North Sámi morphological segmentation. We see a relative improvement in morph boundary F 1 -score of 8.6% compared to using the generative Morfessor FlatCat model directly and 2.4% compared to a seq2seq baseline. Our neural sequence tagging system reaches almost the same performance as the CRF topline. Peer reviewed Other/Unknown Material North Sámi sami Sámi Aalto University Publication Archive (Aaltodoc) |
institution |
Open Polar |
collection |
Aalto University Publication Archive (Aaltodoc) |
op_collection_id |
ftaaltouniv |
language |
English |
topic |
morphology segmentation low-resource settings semi-supervised learning sequence labeling recurrent neural networks conditional random fields north sami |
spellingShingle |
morphology segmentation low-resource settings semi-supervised learning sequence labeling recurrent neural networks conditional random fields north sami Grönroos, Stig-Arne Virpioja, Sami Kurimo, Mikko North Sámi morphological segmentation with low-resource semi-supervised sequence labeling |
topic_facet |
morphology segmentation low-resource settings semi-supervised learning sequence labeling recurrent neural networks conditional random fields north sami |
description |
| openaire: EC/H2020/780069/EU//MeMAD Semi-supervised sequence labeling is an effective way to train a low-resource morphological segmentation system. We show that a feature set augmentation approach, which combines the strengths of generative and discriminative mod- els, is suitable both for graphical models like conditional random field (CRF) and sequence-to-sequence neural models. We perform a comparative evaluation be- tween three existing and one novel semi-supervised segmentation methods. All four systems are language-independent and have open-source implementations. We improve on previous best results for North Sámi morphological segmentation. We see a relative improvement in morph boundary F 1 -score of 8.6% compared to using the generative Morfessor FlatCat model directly and 2.4% compared to a seq2seq baseline. Our neural sequence tagging system reaches almost the same performance as the CRF topline. Peer reviewed |
author2 |
Centre of Excellence in Computational Inference, COIN Dept Signal Process and Acoust Aalto-yliopisto Aalto University |
format |
Other/Unknown Material |
author |
Grönroos, Stig-Arne Virpioja, Sami Kurimo, Mikko |
author_facet |
Grönroos, Stig-Arne Virpioja, Sami Kurimo, Mikko |
author_sort |
Grönroos, Stig-Arne |
title |
North Sámi morphological segmentation with low-resource semi-supervised sequence labeling |
title_short |
North Sámi morphological segmentation with low-resource semi-supervised sequence labeling |
title_full |
North Sámi morphological segmentation with low-resource semi-supervised sequence labeling |
title_fullStr |
North Sámi morphological segmentation with low-resource semi-supervised sequence labeling |
title_full_unstemmed |
North Sámi morphological segmentation with low-resource semi-supervised sequence labeling |
title_sort |
north sámi morphological segmentation with low-resource semi-supervised sequence labeling |
publishDate |
2019 |
url |
https://aaltodoc.aalto.fi/handle/123456789/40463 |
genre |
North Sámi sami Sámi |
genre_facet |
North Sámi sami Sámi |
op_relation |
info:eu-repo/grantAgreement/EC/H2020/780069/EU//MeMAD International Workshop on Computational Linguistics for Uralic Languages Fifth Workshop on Computational Linguistics for Uralic Languages Grönroos , S-A , Virpioja , S & Kurimo , M 2019 , North Sámi morphological segmentation with low-resource semi-supervised sequence labeling . in Fifth Workshop on Computational Linguistics for Uralic Languages : Proceedings of the Workshop . Association for Computational Linguistics , pp. 15-26 , International Workshop on Computational Linguistics for Uralic Languages , Tartu , Estonia , 07/01/2019 . < https://www.aclweb.org/anthology/W19-0302/ > 978-1-948087-92-6 PURE UUID: 832e50de-ac02-4e45-9a9e-af08e5049c1c PURE ITEMURL: https://research.aalto.fi/en/publications/832e50de-ac02-4e45-9a9e-af08e5049c1c PURE LINK: https://www.aclweb.org/anthology/W19-0302/ PURE FILEURL: https://research.aalto.fi/files/36793748/2019_iwclul.published.pdf https://aaltodoc.aalto.fi/handle/123456789/40463 URN:NBN:fi:aalto-201909255484 |
op_rights |
openAccess |
_version_ |
1766140927376621568 |