Effect of Language and Error Models on Efficiency of Finite-State Spell-Checking and Correction*

We inspect the viability of finite-state spell-checking and contextless correction of non-word errors in morphologically different languages. Overviewing previous work, we conduct large-scale tests involving three languages — covering a broad spectrum of morphological features; English, Finnish and...

Full description

Bibliographic Details
Main Authors: Tommi A Pirinen, Sam Hardwick
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Published: 2012
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.303.1476
http://www.helsinki.fi/~tapirine/publications/Pirinen-2012-fsmnlp.pdf
Description
Summary:We inspect the viability of finite-state spell-checking and contextless correction of non-word errors in morphologically different languages. Overviewing previous work, we conduct large-scale tests involving three languages — covering a broad spectrum of morphological features; English, Finnish and Greenlandic — and a variety of error models and algorithms, including proposed improvements of our own. Special reference is made to on-line threeway composition of the input, the error model and the language model. Tests are run on real-world text acquired from freely available sources. We show that the finite-state approaches discussed are sufficiently fast for high-quality correction, even for Greenlandic which, due to its morphological complexity, is a difficult task for non-finite-state approaches. 1