Mii eai leat gal vuollánan – Vi ha neimen ikke gitt opp

Machine learning is the dominating paradigm in natural language processing nowadays. It requires vast amounts of manually annotated or synthetically generated text data. In the GiellaLT infrastructure, on the other hand, we have worked with rule-based methods, where the linguistis have full control...

Full description

Bibliographic Details
Published in:	Nordlyd
Main Authors:	Linda Wiechetek, Flammie Pirinen, Børre Gaup, Chiara Argese, Thomas Omma
Format:	Article in Journal/Newspaper
Language:	English Norwegian
Published:	Septentrio Academic Publishing 2022
Subjects:	Sámi language grammar checking neural networks nlp rule-based agreement Language. Linguistic theory. Comparative grammar P101-410 Sámi
Online Access:	https://doi.org/10.7557/12.6346 https://doaj.org/article/1063afdd01b14ca0b9106b2ea873221b

id	ftdoajarticles:oai:doaj.org/article:1063afdd01b14ca0b9106b2ea873221b
record_format	openpolar
spelling	ftdoajarticles:oai:doaj.org/article:1063afdd01b14ca0b9106b2ea873221b 2023-05-15T18:14:47+02:00 Mii eai leat gal vuollánan – Vi ha neimen ikke gitt opp Linda Wiechetek Flammie Pirinen Børre Gaup Chiara Argese Thomas Omma 2022-08-01T00:00:00Z https://doi.org/10.7557/12.6346 https://doaj.org/article/1063afdd01b14ca0b9106b2ea873221b EN NO eng nor Septentrio Academic Publishing https://septentrio.uit.no/index.php/nordlyd/article/view/6346 https://doaj.org/toc/1503-8599 doi:10.7557/12.6346 1503-8599 https://doaj.org/article/1063afdd01b14ca0b9106b2ea873221b Nordlyd: Tromsø University Working Papers on Language & Linguistics, Vol 46, Iss 1 (2022) Sámi language grammar checking neural networks nlp rule-based agreement Language. Linguistic theory. Comparative grammar P101-410 article 2022 ftdoajarticles https://doi.org/10.7557/12.6346 2022-12-31T00:18:03Z Machine learning is the dominating paradigm in natural language processing nowadays. It requires vast amounts of manually annotated or synthetically generated text data. In the GiellaLT infrastructure, on the other hand, we have worked with rule-based methods, where the linguistis have full control over the development the tools. In this article we uncover the myth of machine learning being cheaper than a rule- based approach by showing how much work there is behind data generation, either via corpus annotation or creating tools that automatically mark-up the corpus. Earlier we have shown that the correction of grammatical errors, in particular compound errors, benefit from hybrid methods. Agreement errors, on the other other hand, are to a higher degree dependent on the larger grammatical context. Our experiments show that machine learning methods for this error type, even when supplemented by rule-based methods generating massive data, can not compete with the state-of-the-art rule-based approach. Article in Journal/Newspaper Sámi Directory of Open Access Journals: DOAJ Articles Nordlyd 46 1
institution	Open Polar
collection	Directory of Open Access Journals: DOAJ Articles
op_collection_id	ftdoajarticles
language	English Norwegian
topic	Sámi language grammar checking neural networks nlp rule-based agreement Language. Linguistic theory. Comparative grammar P101-410
spellingShingle	Sámi language grammar checking neural networks nlp rule-based agreement Language. Linguistic theory. Comparative grammar P101-410 Linda Wiechetek Flammie Pirinen Børre Gaup Chiara Argese Thomas Omma Mii eai leat gal vuollánan – Vi ha neimen ikke gitt opp
topic_facet	Sámi language grammar checking neural networks nlp rule-based agreement Language. Linguistic theory. Comparative grammar P101-410
description	Machine learning is the dominating paradigm in natural language processing nowadays. It requires vast amounts of manually annotated or synthetically generated text data. In the GiellaLT infrastructure, on the other hand, we have worked with rule-based methods, where the linguistis have full control over the development the tools. In this article we uncover the myth of machine learning being cheaper than a rule- based approach by showing how much work there is behind data generation, either via corpus annotation or creating tools that automatically mark-up the corpus. Earlier we have shown that the correction of grammatical errors, in particular compound errors, benefit from hybrid methods. Agreement errors, on the other other hand, are to a higher degree dependent on the larger grammatical context. Our experiments show that machine learning methods for this error type, even when supplemented by rule-based methods generating massive data, can not compete with the state-of-the-art rule-based approach.
format	Article in Journal/Newspaper
author	Linda Wiechetek Flammie Pirinen Børre Gaup Chiara Argese Thomas Omma
author_facet	Linda Wiechetek Flammie Pirinen Børre Gaup Chiara Argese Thomas Omma
author_sort	Linda Wiechetek
title	Mii eai leat gal vuollánan – Vi ha neimen ikke gitt opp
title_short	Mii eai leat gal vuollánan – Vi ha neimen ikke gitt opp
title_full	Mii eai leat gal vuollánan – Vi ha neimen ikke gitt opp
title_fullStr	Mii eai leat gal vuollánan – Vi ha neimen ikke gitt opp
title_full_unstemmed	Mii eai leat gal vuollánan – Vi ha neimen ikke gitt opp
title_sort	mii eai leat gal vuollánan – vi ha neimen ikke gitt opp
publisher	Septentrio Academic Publishing
publishDate	2022
url	https://doi.org/10.7557/12.6346 https://doaj.org/article/1063afdd01b14ca0b9106b2ea873221b
genre	Sámi
genre_facet	Sámi
op_source	Nordlyd: Tromsø University Working Papers on Language & Linguistics, Vol 46, Iss 1 (2022)
op_relation	https://septentrio.uit.no/index.php/nordlyd/article/view/6346 https://doaj.org/toc/1503-8599 doi:10.7557/12.6346 1503-8599 https://doaj.org/article/1063afdd01b14ca0b9106b2ea873221b
op_doi	https://doi.org/10.7557/12.6346
container_title	Nordlyd
container_volume	46
container_issue	1
_version_	1766187783032930304

Mii *eai leat gal vuollánan – Vi *ha neimen ikke gitt opp

Similar Items

Mii eai leat gal vuollánan – Vi ha neimen ikke gitt opp