You can’t suggest that?! Comparisons and improvements of speller error models
In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them. The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian,...
Published in: | Nordlyd |
---|---|
Main Authors: | , , |
Format: | Article in Journal/Newspaper |
Language: | English |
Published: |
Septentrio Academic Publishing
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10037/28381 https://doi.org/10.7557/12.6349 |
id |
ftunivtroemsoe:oai:munin.uit.no:10037/28381 |
---|---|
record_format |
openpolar |
spelling |
ftunivtroemsoe:oai:munin.uit.no:10037/28381 2023-05-15T17:40:07+02:00 You can’t suggest that?! Comparisons and improvements of speller error models Pirinen, Flammie Moshagen, Sjur Nørstebø Kaalep, Heiki-Jaan 2022-08-30 https://hdl.handle.net/10037/28381 https://doi.org/10.7557/12.6349 eng eng Septentrio Academic Publishing Nordlyd Pirinen, Moshagen, Kaalep. You can’t suggest that?! Comparisons and improvements of speller error models . Nordlyd. 2022 FRIDAID 2114174 doi:10.7557/12.6349 0332-7531 1503-8599 https://hdl.handle.net/10037/28381 Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Copyright 2022 The Author(s) https://creativecommons.org/licenses/by-nc/4.0 CC-BY-NC Journal article Tidsskriftartikkel Peer reviewed publishedVersion 2022 ftunivtroemsoe https://doi.org/10.7557/12.6349 2023-02-02T00:03:41Z In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them. The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian, North Sámi and South Sámi. The first approach of modelling spelling errors is rule-based, where experts write rules that describe the kind of errors are made, and these are compiled into finite-state automaton that models the errors. The second is data-based, where we show a machine learning algorithm a corpus of errors that humans have made, and it creates a neural network that can model the errors. Both approaches require collection of error corpora and understanding its contents; therefore we also describe the actual errors we have seen in detail. We find that while both approaches create error correction systems, with current resources the expert-build systems are still more reliable. Article in Journal/Newspaper North Sámi South Sámi University of Tromsø: Munin Open Research Archive Speller ENVELOPE(-60.717,-60.717,-62.500,-62.500) Nordlyd 46 1 |
institution |
Open Polar |
collection |
University of Tromsø: Munin Open Research Archive |
op_collection_id |
ftunivtroemsoe |
language |
English |
description |
In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them. The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian, North Sámi and South Sámi. The first approach of modelling spelling errors is rule-based, where experts write rules that describe the kind of errors are made, and these are compiled into finite-state automaton that models the errors. The second is data-based, where we show a machine learning algorithm a corpus of errors that humans have made, and it creates a neural network that can model the errors. Both approaches require collection of error corpora and understanding its contents; therefore we also describe the actual errors we have seen in detail. We find that while both approaches create error correction systems, with current resources the expert-build systems are still more reliable. |
format |
Article in Journal/Newspaper |
author |
Pirinen, Flammie Moshagen, Sjur Nørstebø Kaalep, Heiki-Jaan |
spellingShingle |
Pirinen, Flammie Moshagen, Sjur Nørstebø Kaalep, Heiki-Jaan You can’t suggest that?! Comparisons and improvements of speller error models |
author_facet |
Pirinen, Flammie Moshagen, Sjur Nørstebø Kaalep, Heiki-Jaan |
author_sort |
Pirinen, Flammie |
title |
You can’t suggest that?! Comparisons and improvements of speller error models |
title_short |
You can’t suggest that?! Comparisons and improvements of speller error models |
title_full |
You can’t suggest that?! Comparisons and improvements of speller error models |
title_fullStr |
You can’t suggest that?! Comparisons and improvements of speller error models |
title_full_unstemmed |
You can’t suggest that?! Comparisons and improvements of speller error models |
title_sort |
you can’t suggest that?! comparisons and improvements of speller error models |
publisher |
Septentrio Academic Publishing |
publishDate |
2022 |
url |
https://hdl.handle.net/10037/28381 https://doi.org/10.7557/12.6349 |
long_lat |
ENVELOPE(-60.717,-60.717,-62.500,-62.500) |
geographic |
Speller |
geographic_facet |
Speller |
genre |
North Sámi South Sámi |
genre_facet |
North Sámi South Sámi |
op_relation |
Nordlyd Pirinen, Moshagen, Kaalep. You can’t suggest that?! Comparisons and improvements of speller error models . Nordlyd. 2022 FRIDAID 2114174 doi:10.7557/12.6349 0332-7531 1503-8599 https://hdl.handle.net/10037/28381 |
op_rights |
Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Copyright 2022 The Author(s) https://creativecommons.org/licenses/by-nc/4.0 |
op_rightsnorm |
CC-BY-NC |
op_doi |
https://doi.org/10.7557/12.6349 |
container_title |
Nordlyd |
container_volume |
46 |
container_issue |
1 |
_version_ |
1766140932676124672 |