You can’t suggest that?!

In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them. The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian,...

Full description

Bibliographic Details
Published in:Nordlyd
Main Authors: Heiki-Jaan Kaalep, Flammie Pirinen, Sjur Moshagen
Format: Article in Journal/Newspaper
Language:English
Norwegian
Published: Septentrio Academic Publishing 2022
Subjects:
fsa
Online Access:https://doi.org/10.7557/12.6349
https://doaj.org/article/66fab39dfd704bbc974aa4e1b585c8fa
id ftdoajarticles:oai:doaj.org/article:66fab39dfd704bbc974aa4e1b585c8fa
record_format openpolar
spelling ftdoajarticles:oai:doaj.org/article:66fab39dfd704bbc974aa4e1b585c8fa 2023-05-15T17:40:07+02:00 You can’t suggest that?! Heiki-Jaan Kaalep Flammie Pirinen Sjur Moshagen 2022-08-01T00:00:00Z https://doi.org/10.7557/12.6349 https://doaj.org/article/66fab39dfd704bbc974aa4e1b585c8fa EN NO eng nor Septentrio Academic Publishing https://septentrio.uit.no/index.php/nordlyd/article/view/6349 https://doaj.org/toc/1503-8599 doi:10.7557/12.6349 1503-8599 https://doaj.org/article/66fab39dfd704bbc974aa4e1b585c8fa Nordlyd: Tromsø University Working Papers on Language & Linguistics, Vol 46, Iss 1 (2022) Spell-Checking rule-based fsa machine learning sami languages estonian Language. Linguistic theory. Comparative grammar P101-410 article 2022 ftdoajarticles https://doi.org/10.7557/12.6349 2022-12-30T20:03:56Z In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them. The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian, North Sámi and South Sámi. The first approach of modelling spelling errors is rule-based, where experts write rules that describe the kind of errors are made, and these are compiled into finite-state automaton that models the errors. The second is data-based, where we show a machine learning algorithm a corpus of errors that humans have made, and it creates a neural network that can model the errors. Both approaches require collection of error corpora and understanding its contents; therefore we also describe the actual errors we have seen in detail. We find that while both approaches create error correction systems, with current resources the expert-build systems are still more reliable. Article in Journal/Newspaper North Sámi sami Sámi South Sámi Directory of Open Access Journals: DOAJ Articles Nordlyd 46 1
institution Open Polar
collection Directory of Open Access Journals: DOAJ Articles
op_collection_id ftdoajarticles
language English
Norwegian
topic Spell-Checking
rule-based
fsa
machine learning
sami languages
estonian
Language. Linguistic theory. Comparative grammar
P101-410
spellingShingle Spell-Checking
rule-based
fsa
machine learning
sami languages
estonian
Language. Linguistic theory. Comparative grammar
P101-410
Heiki-Jaan Kaalep
Flammie Pirinen
Sjur Moshagen
You can’t suggest that?!
topic_facet Spell-Checking
rule-based
fsa
machine learning
sami languages
estonian
Language. Linguistic theory. Comparative grammar
P101-410
description In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them. The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian, North Sámi and South Sámi. The first approach of modelling spelling errors is rule-based, where experts write rules that describe the kind of errors are made, and these are compiled into finite-state automaton that models the errors. The second is data-based, where we show a machine learning algorithm a corpus of errors that humans have made, and it creates a neural network that can model the errors. Both approaches require collection of error corpora and understanding its contents; therefore we also describe the actual errors we have seen in detail. We find that while both approaches create error correction systems, with current resources the expert-build systems are still more reliable.
format Article in Journal/Newspaper
author Heiki-Jaan Kaalep
Flammie Pirinen
Sjur Moshagen
author_facet Heiki-Jaan Kaalep
Flammie Pirinen
Sjur Moshagen
author_sort Heiki-Jaan Kaalep
title You can’t suggest that?!
title_short You can’t suggest that?!
title_full You can’t suggest that?!
title_fullStr You can’t suggest that?!
title_full_unstemmed You can’t suggest that?!
title_sort you can’t suggest that?!
publisher Septentrio Academic Publishing
publishDate 2022
url https://doi.org/10.7557/12.6349
https://doaj.org/article/66fab39dfd704bbc974aa4e1b585c8fa
genre North Sámi
sami
Sámi
South Sámi
genre_facet North Sámi
sami
Sámi
South Sámi
op_source Nordlyd: Tromsø University Working Papers on Language & Linguistics, Vol 46, Iss 1 (2022)
op_relation https://septentrio.uit.no/index.php/nordlyd/article/view/6349
https://doaj.org/toc/1503-8599
doi:10.7557/12.6349
1503-8599
https://doaj.org/article/66fab39dfd704bbc974aa4e1b585c8fa
op_doi https://doi.org/10.7557/12.6349
container_title Nordlyd
container_volume 46
container_issue 1
_version_ 1766140933153226752