North Sámi morphological segmentation with low-resource semi-supervised sequence labeling

| openaire: EC/H2020/780069/EU//MeMAD Semi-supervised sequence labeling is an effective way to train a low-resource morphological segmentation system. We show that a feature set augmentation approach, which combines the strengths of generative and discriminative mod- els, is suitable both for graphi...

Full description

Bibliographic Details
Main Authors: Grönroos, Stig-Arne, Virpioja, Sami, Kurimo, Mikko
Other Authors: Centre of Excellence in Computational Inference, COIN, Dept Signal Process and Acoust, Aalto-yliopisto, Aalto University
Format: Other/Unknown Material
Language:English
Published: 2019
Subjects:
Online Access:https://aaltodoc.aalto.fi/handle/123456789/40463
id ftaaltouniv:oai:aaltodoc.aalto.fi:123456789/40463
record_format openpolar
spelling ftaaltouniv:oai:aaltodoc.aalto.fi:123456789/40463 2023-05-15T17:40:07+02:00 North Sámi morphological segmentation with low-resource semi-supervised sequence labeling Grönroos, Stig-Arne Virpioja, Sami Kurimo, Mikko Centre of Excellence in Computational Inference, COIN Dept Signal Process and Acoust Aalto-yliopisto Aalto University 2019-01-07 application/pdf https://aaltodoc.aalto.fi/handle/123456789/40463 en eng info:eu-repo/grantAgreement/EC/H2020/780069/EU//MeMAD International Workshop on Computational Linguistics for Uralic Languages Fifth Workshop on Computational Linguistics for Uralic Languages Grönroos , S-A , Virpioja , S & Kurimo , M 2019 , North Sámi morphological segmentation with low-resource semi-supervised sequence labeling . in Fifth Workshop on Computational Linguistics for Uralic Languages : Proceedings of the Workshop . Association for Computational Linguistics , pp. 15-26 , International Workshop on Computational Linguistics for Uralic Languages , Tartu , Estonia , 07/01/2019 . < https://www.aclweb.org/anthology/W19-0302/ > 978-1-948087-92-6 PURE UUID: 832e50de-ac02-4e45-9a9e-af08e5049c1c PURE ITEMURL: https://research.aalto.fi/en/publications/832e50de-ac02-4e45-9a9e-af08e5049c1c PURE LINK: https://www.aclweb.org/anthology/W19-0302/ PURE FILEURL: https://research.aalto.fi/files/36793748/2019_iwclul.published.pdf https://aaltodoc.aalto.fi/handle/123456789/40463 URN:NBN:fi:aalto-201909255484 openAccess morphology segmentation low-resource settings semi-supervised learning sequence labeling recurrent neural networks conditional random fields north sami A4 Artikkeli konferenssijulkaisussa publishedVersion 2019 ftaaltouniv 2023-01-25T23:57:39Z | openaire: EC/H2020/780069/EU//MeMAD Semi-supervised sequence labeling is an effective way to train a low-resource morphological segmentation system. We show that a feature set augmentation approach, which combines the strengths of generative and discriminative mod- els, is suitable both for graphical models like conditional random field (CRF) and sequence-to-sequence neural models. We perform a comparative evaluation be- tween three existing and one novel semi-supervised segmentation methods. All four systems are language-independent and have open-source implementations. We improve on previous best results for North Sámi morphological segmentation. We see a relative improvement in morph boundary F 1 -score of 8.6% compared to using the generative Morfessor FlatCat model directly and 2.4% compared to a seq2seq baseline. Our neural sequence tagging system reaches almost the same performance as the CRF topline. Peer reviewed Other/Unknown Material North Sámi sami Sámi Aalto University Publication Archive (Aaltodoc)
institution Open Polar
collection Aalto University Publication Archive (Aaltodoc)
op_collection_id ftaaltouniv
language English
topic morphology
segmentation
low-resource settings
semi-supervised learning
sequence labeling
recurrent neural networks
conditional random fields
north sami
spellingShingle morphology
segmentation
low-resource settings
semi-supervised learning
sequence labeling
recurrent neural networks
conditional random fields
north sami
Grönroos, Stig-Arne
Virpioja, Sami
Kurimo, Mikko
North Sámi morphological segmentation with low-resource semi-supervised sequence labeling
topic_facet morphology
segmentation
low-resource settings
semi-supervised learning
sequence labeling
recurrent neural networks
conditional random fields
north sami
description | openaire: EC/H2020/780069/EU//MeMAD Semi-supervised sequence labeling is an effective way to train a low-resource morphological segmentation system. We show that a feature set augmentation approach, which combines the strengths of generative and discriminative mod- els, is suitable both for graphical models like conditional random field (CRF) and sequence-to-sequence neural models. We perform a comparative evaluation be- tween three existing and one novel semi-supervised segmentation methods. All four systems are language-independent and have open-source implementations. We improve on previous best results for North Sámi morphological segmentation. We see a relative improvement in morph boundary F 1 -score of 8.6% compared to using the generative Morfessor FlatCat model directly and 2.4% compared to a seq2seq baseline. Our neural sequence tagging system reaches almost the same performance as the CRF topline. Peer reviewed
author2 Centre of Excellence in Computational Inference, COIN
Dept Signal Process and Acoust
Aalto-yliopisto
Aalto University
format Other/Unknown Material
author Grönroos, Stig-Arne
Virpioja, Sami
Kurimo, Mikko
author_facet Grönroos, Stig-Arne
Virpioja, Sami
Kurimo, Mikko
author_sort Grönroos, Stig-Arne
title North Sámi morphological segmentation with low-resource semi-supervised sequence labeling
title_short North Sámi morphological segmentation with low-resource semi-supervised sequence labeling
title_full North Sámi morphological segmentation with low-resource semi-supervised sequence labeling
title_fullStr North Sámi morphological segmentation with low-resource semi-supervised sequence labeling
title_full_unstemmed North Sámi morphological segmentation with low-resource semi-supervised sequence labeling
title_sort north sámi morphological segmentation with low-resource semi-supervised sequence labeling
publishDate 2019
url https://aaltodoc.aalto.fi/handle/123456789/40463
genre North Sámi
sami
Sámi
genre_facet North Sámi
sami
Sámi
op_relation info:eu-repo/grantAgreement/EC/H2020/780069/EU//MeMAD
International Workshop on Computational Linguistics for Uralic Languages
Fifth Workshop on Computational Linguistics for Uralic Languages
Grönroos , S-A , Virpioja , S & Kurimo , M 2019 , North Sámi morphological segmentation with low-resource semi-supervised sequence labeling . in Fifth Workshop on Computational Linguistics for Uralic Languages : Proceedings of the Workshop . Association for Computational Linguistics , pp. 15-26 , International Workshop on Computational Linguistics for Uralic Languages , Tartu , Estonia , 07/01/2019 . < https://www.aclweb.org/anthology/W19-0302/ >
978-1-948087-92-6
PURE UUID: 832e50de-ac02-4e45-9a9e-af08e5049c1c
PURE ITEMURL: https://research.aalto.fi/en/publications/832e50de-ac02-4e45-9a9e-af08e5049c1c
PURE LINK: https://www.aclweb.org/anthology/W19-0302/
PURE FILEURL: https://research.aalto.fi/files/36793748/2019_iwclul.published.pdf
https://aaltodoc.aalto.fi/handle/123456789/40463
URN:NBN:fi:aalto-201909255484
op_rights openAccess
_version_ 1766140927376621568