You can’t suggest that?! Comparisons and improvements of speller error models

In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them. The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian,...

Full description

Bibliographic Details
Published in:Nordlyd
Main Authors: Pirinen, Flammie, Moshagen, Sjur Nørstebø, Kaalep, Heiki-Jaan
Format: Article in Journal/Newspaper
Language:English
Published: Septentrio Academic Publishing 2022
Subjects:
Online Access:https://hdl.handle.net/10037/28381
https://doi.org/10.7557/12.6349
id ftunivtroemsoe:oai:munin.uit.no:10037/28381
record_format openpolar
spelling ftunivtroemsoe:oai:munin.uit.no:10037/28381 2023-05-15T17:40:07+02:00 You can’t suggest that?! Comparisons and improvements of speller error models Pirinen, Flammie Moshagen, Sjur Nørstebø Kaalep, Heiki-Jaan 2022-08-30 https://hdl.handle.net/10037/28381 https://doi.org/10.7557/12.6349 eng eng Septentrio Academic Publishing Nordlyd Pirinen, Moshagen, Kaalep. You can’t suggest that?! Comparisons and improvements of speller error models . Nordlyd. 2022 FRIDAID 2114174 doi:10.7557/12.6349 0332-7531 1503-8599 https://hdl.handle.net/10037/28381 Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Copyright 2022 The Author(s) https://creativecommons.org/licenses/by-nc/4.0 CC-BY-NC Journal article Tidsskriftartikkel Peer reviewed publishedVersion 2022 ftunivtroemsoe https://doi.org/10.7557/12.6349 2023-02-02T00:03:41Z In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them. The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian, North Sámi and South Sámi. The first approach of modelling spelling errors is rule-based, where experts write rules that describe the kind of errors are made, and these are compiled into finite-state automaton that models the errors. The second is data-based, where we show a machine learning algorithm a corpus of errors that humans have made, and it creates a neural network that can model the errors. Both approaches require collection of error corpora and understanding its contents; therefore we also describe the actual errors we have seen in detail. We find that while both approaches create error correction systems, with current resources the expert-build systems are still more reliable. Article in Journal/Newspaper North Sámi South Sámi University of Tromsø: Munin Open Research Archive Speller ENVELOPE(-60.717,-60.717,-62.500,-62.500) Nordlyd 46 1
institution Open Polar
collection University of Tromsø: Munin Open Research Archive
op_collection_id ftunivtroemsoe
language English
description In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them. The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian, North Sámi and South Sámi. The first approach of modelling spelling errors is rule-based, where experts write rules that describe the kind of errors are made, and these are compiled into finite-state automaton that models the errors. The second is data-based, where we show a machine learning algorithm a corpus of errors that humans have made, and it creates a neural network that can model the errors. Both approaches require collection of error corpora and understanding its contents; therefore we also describe the actual errors we have seen in detail. We find that while both approaches create error correction systems, with current resources the expert-build systems are still more reliable.
format Article in Journal/Newspaper
author Pirinen, Flammie
Moshagen, Sjur Nørstebø
Kaalep, Heiki-Jaan
spellingShingle Pirinen, Flammie
Moshagen, Sjur Nørstebø
Kaalep, Heiki-Jaan
You can’t suggest that?! Comparisons and improvements of speller error models
author_facet Pirinen, Flammie
Moshagen, Sjur Nørstebø
Kaalep, Heiki-Jaan
author_sort Pirinen, Flammie
title You can’t suggest that?! Comparisons and improvements of speller error models
title_short You can’t suggest that?! Comparisons and improvements of speller error models
title_full You can’t suggest that?! Comparisons and improvements of speller error models
title_fullStr You can’t suggest that?! Comparisons and improvements of speller error models
title_full_unstemmed You can’t suggest that?! Comparisons and improvements of speller error models
title_sort you can’t suggest that?! comparisons and improvements of speller error models
publisher Septentrio Academic Publishing
publishDate 2022
url https://hdl.handle.net/10037/28381
https://doi.org/10.7557/12.6349
long_lat ENVELOPE(-60.717,-60.717,-62.500,-62.500)
geographic Speller
geographic_facet Speller
genre North Sámi
South Sámi
genre_facet North Sámi
South Sámi
op_relation Nordlyd
Pirinen, Moshagen, Kaalep. You can’t suggest that?! Comparisons and improvements of speller error models . Nordlyd. 2022
FRIDAID 2114174
doi:10.7557/12.6349
0332-7531
1503-8599
https://hdl.handle.net/10037/28381
op_rights Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
Copyright 2022 The Author(s)
https://creativecommons.org/licenses/by-nc/4.0
op_rightsnorm CC-BY-NC
op_doi https://doi.org/10.7557/12.6349
container_title Nordlyd
container_volume 46
container_issue 1
_version_ 1766140932676124672