Finite-State Spell-Checking with Weighted Language and Error Models : Building and Evaluating Spell-Checkers with Wikipedia as Corpus

In this paper we present simple methods for construction and evaluation of finite-state spell-checking tools using an existing finite-state lexical automaton, freely available finite-state tools and Internet corpora acquired from projects such as Wikipedia. As an example, we use a freely available o...

Full description

Bibliographic Details
Main Authors: Pirinen, Tommi, Linden, Krister
Other Authors: Department of Modern Languages 2010-2017, Krister Linden / Research Group
Format: Conference Object
Language:English
Published: 2012
Subjects:
Online Access:http://hdl.handle.net/10138/29358
Description
Summary:In this paper we present simple methods for construction and evaluation of finite-state spell-checking tools using an existing finite-state lexical automaton, freely available finite-state tools and Internet corpora acquired from projects such as Wikipedia. As an example, we use a freely available open-source implementation of Finnish morphology, made with traditional finite-state morphology tools, and demonstrate rapid building of Northern Sámi and English spell checkers from tools and resources available from the Internet. Peer reviewed