Mii *eai leat gal vuollánan – Vi *ha neimen ikke gitt opp

Machine learning is the dominating paradigm in natural language processing nowadays. It requires vast amounts of manually annotated or synthetically generated text data. In the GiellaLT infrastructure, on the other hand, we have worked with rule-based methods, where the linguistis have full control...

Full description

Bibliographic Details
Published in:Nordlyd
Main Authors: Linda Wiechetek, Flammie Pirinen, Børre Gaup, Chiara Argese, Thomas Omma
Format: Article in Journal/Newspaper
Language:English
Norwegian
Published: Septentrio Academic Publishing 2022
Subjects:
nlp
Online Access:https://doi.org/10.7557/12.6346
https://doaj.org/article/1063afdd01b14ca0b9106b2ea873221b
id ftdoajarticles:oai:doaj.org/article:1063afdd01b14ca0b9106b2ea873221b
record_format openpolar
spelling ftdoajarticles:oai:doaj.org/article:1063afdd01b14ca0b9106b2ea873221b 2023-05-15T18:14:47+02:00 Mii *eai leat gal vuollánan – Vi *ha neimen ikke gitt opp Linda Wiechetek Flammie Pirinen Børre Gaup Chiara Argese Thomas Omma 2022-08-01T00:00:00Z https://doi.org/10.7557/12.6346 https://doaj.org/article/1063afdd01b14ca0b9106b2ea873221b EN NO eng nor Septentrio Academic Publishing https://septentrio.uit.no/index.php/nordlyd/article/view/6346 https://doaj.org/toc/1503-8599 doi:10.7557/12.6346 1503-8599 https://doaj.org/article/1063afdd01b14ca0b9106b2ea873221b Nordlyd: Tromsø University Working Papers on Language & Linguistics, Vol 46, Iss 1 (2022) Sámi language grammar checking neural networks nlp rule-based agreement Language. Linguistic theory. Comparative grammar P101-410 article 2022 ftdoajarticles https://doi.org/10.7557/12.6346 2022-12-31T00:18:03Z Machine learning is the dominating paradigm in natural language processing nowadays. It requires vast amounts of manually annotated or synthetically generated text data. In the GiellaLT infrastructure, on the other hand, we have worked with rule-based methods, where the linguistis have full control over the development the tools. In this article we uncover the myth of machine learning being cheaper than a rule- based approach by showing how much work there is behind data generation, either via corpus annotation or creating tools that automatically mark-up the corpus. Earlier we have shown that the correction of grammatical errors, in particular compound errors, benefit from hybrid methods. Agreement errors, on the other other hand, are to a higher degree dependent on the larger grammatical context. Our experiments show that machine learning methods for this error type, even when supplemented by rule-based methods generating massive data, can not compete with the state-of-the-art rule-based approach. Article in Journal/Newspaper Sámi Directory of Open Access Journals: DOAJ Articles Nordlyd 46 1
institution Open Polar
collection Directory of Open Access Journals: DOAJ Articles
op_collection_id ftdoajarticles
language English
Norwegian
topic Sámi language
grammar checking
neural networks
nlp
rule-based
agreement
Language. Linguistic theory. Comparative grammar
P101-410
spellingShingle Sámi language
grammar checking
neural networks
nlp
rule-based
agreement
Language. Linguistic theory. Comparative grammar
P101-410
Linda Wiechetek
Flammie Pirinen
Børre Gaup
Chiara Argese
Thomas Omma
Mii *eai leat gal vuollánan – Vi *ha neimen ikke gitt opp
topic_facet Sámi language
grammar checking
neural networks
nlp
rule-based
agreement
Language. Linguistic theory. Comparative grammar
P101-410
description Machine learning is the dominating paradigm in natural language processing nowadays. It requires vast amounts of manually annotated or synthetically generated text data. In the GiellaLT infrastructure, on the other hand, we have worked with rule-based methods, where the linguistis have full control over the development the tools. In this article we uncover the myth of machine learning being cheaper than a rule- based approach by showing how much work there is behind data generation, either via corpus annotation or creating tools that automatically mark-up the corpus. Earlier we have shown that the correction of grammatical errors, in particular compound errors, benefit from hybrid methods. Agreement errors, on the other other hand, are to a higher degree dependent on the larger grammatical context. Our experiments show that machine learning methods for this error type, even when supplemented by rule-based methods generating massive data, can not compete with the state-of-the-art rule-based approach.
format Article in Journal/Newspaper
author Linda Wiechetek
Flammie Pirinen
Børre Gaup
Chiara Argese
Thomas Omma
author_facet Linda Wiechetek
Flammie Pirinen
Børre Gaup
Chiara Argese
Thomas Omma
author_sort Linda Wiechetek
title Mii *eai leat gal vuollánan – Vi *ha neimen ikke gitt opp
title_short Mii *eai leat gal vuollánan – Vi *ha neimen ikke gitt opp
title_full Mii *eai leat gal vuollánan – Vi *ha neimen ikke gitt opp
title_fullStr Mii *eai leat gal vuollánan – Vi *ha neimen ikke gitt opp
title_full_unstemmed Mii *eai leat gal vuollánan – Vi *ha neimen ikke gitt opp
title_sort mii *eai leat gal vuollánan – vi *ha neimen ikke gitt opp
publisher Septentrio Academic Publishing
publishDate 2022
url https://doi.org/10.7557/12.6346
https://doaj.org/article/1063afdd01b14ca0b9106b2ea873221b
genre Sámi
genre_facet Sámi
op_source Nordlyd: Tromsø University Working Papers on Language & Linguistics, Vol 46, Iss 1 (2022)
op_relation https://septentrio.uit.no/index.php/nordlyd/article/view/6346
https://doaj.org/toc/1503-8599
doi:10.7557/12.6346
1503-8599
https://doaj.org/article/1063afdd01b14ca0b9106b2ea873221b
op_doi https://doi.org/10.7557/12.6346
container_title Nordlyd
container_volume 46
container_issue 1
_version_ 1766187783032930304