Using Finite State Transducers for Making Efficient Reading Comprehension Dictionaries

This article presents a novel way of combining finite-state transducers (FSTs) with electronic dictionaries, thereby creating efficient reading comprehension dictionaries. We compare a North Saami- Norwegian and a South Saami- Norwegian dictionary, both enriched with an FST, with existing, available...

Full description

Bibliographic Details
Main Authors: Ryan Johnson, Lene Antonsen, Trond Trosterud
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.658.4282
http://emmtee.net/oe/nodalida13/conference/45.pdf
id ftciteseerx:oai:CiteSeerX.psu:10.1.1.658.4282
record_format openpolar
spelling ftciteseerx:oai:CiteSeerX.psu:10.1.1.658.4282 2023-05-15T18:08:14+02:00 Using Finite State Transducers for Making Efficient Reading Comprehension Dictionaries Ryan Johnson Lene Antonsen Trond Trosterud The Pennsylvania State University CiteSeerX Archives application/pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.658.4282 http://emmtee.net/oe/nodalida13/conference/45.pdf en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.658.4282 http://emmtee.net/oe/nodalida13/conference/45.pdf Metadata may be used without restrictions as long as the oai identifier remains attached to it. http://emmtee.net/oe/nodalida13/conference/45.pdf Lexicography Computational Morphology Orthographic Variation Finite-state Transducers Electronic Dictionaries text ftciteseerx 2016-01-08T16:43:32Z This article presents a novel way of combining finite-state transducers (FSTs) with electronic dictionaries, thereby creating efficient reading comprehension dictionaries. We compare a North Saami- Norwegian and a South Saami- Norwegian dictionary, both enriched with an FST, with existing, available dictionaries containing pre-generated paradigms, and show the advantages of our approach. Being more flexible, the FSTs may also adjust the dictionary to different contexts. The finite state transducer analyses the word to be looked up, and the dictionary itself conducts the actual lookup. The FST part is crucial for morphology-rich languages, where as little as 10 % of the wordforms in running text actually consists of lemma forms. If a compound or derived word, or a word with an enclitic particle is not found in the dictionary, the FST will give the stems and derivation affixes of the wordform, and each of the stems will be given a separate translation. In this way, the coverage of the FST-dictionary will be far larger than an ordinary dictionary of the same size. Text saami Unknown Lemma ENVELOPE(19.530,19.530,69.873,69.873)
institution Open Polar
collection Unknown
op_collection_id ftciteseerx
language English
topic Lexicography
Computational Morphology
Orthographic Variation
Finite-state Transducers
Electronic Dictionaries
spellingShingle Lexicography
Computational Morphology
Orthographic Variation
Finite-state Transducers
Electronic Dictionaries
Ryan Johnson
Lene Antonsen
Trond Trosterud
Using Finite State Transducers for Making Efficient Reading Comprehension Dictionaries
topic_facet Lexicography
Computational Morphology
Orthographic Variation
Finite-state Transducers
Electronic Dictionaries
description This article presents a novel way of combining finite-state transducers (FSTs) with electronic dictionaries, thereby creating efficient reading comprehension dictionaries. We compare a North Saami- Norwegian and a South Saami- Norwegian dictionary, both enriched with an FST, with existing, available dictionaries containing pre-generated paradigms, and show the advantages of our approach. Being more flexible, the FSTs may also adjust the dictionary to different contexts. The finite state transducer analyses the word to be looked up, and the dictionary itself conducts the actual lookup. The FST part is crucial for morphology-rich languages, where as little as 10 % of the wordforms in running text actually consists of lemma forms. If a compound or derived word, or a word with an enclitic particle is not found in the dictionary, the FST will give the stems and derivation affixes of the wordform, and each of the stems will be given a separate translation. In this way, the coverage of the FST-dictionary will be far larger than an ordinary dictionary of the same size.
author2 The Pennsylvania State University CiteSeerX Archives
format Text
author Ryan Johnson
Lene Antonsen
Trond Trosterud
author_facet Ryan Johnson
Lene Antonsen
Trond Trosterud
author_sort Ryan Johnson
title Using Finite State Transducers for Making Efficient Reading Comprehension Dictionaries
title_short Using Finite State Transducers for Making Efficient Reading Comprehension Dictionaries
title_full Using Finite State Transducers for Making Efficient Reading Comprehension Dictionaries
title_fullStr Using Finite State Transducers for Making Efficient Reading Comprehension Dictionaries
title_full_unstemmed Using Finite State Transducers for Making Efficient Reading Comprehension Dictionaries
title_sort using finite state transducers for making efficient reading comprehension dictionaries
url http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.658.4282
http://emmtee.net/oe/nodalida13/conference/45.pdf
long_lat ENVELOPE(19.530,19.530,69.873,69.873)
geographic Lemma
geographic_facet Lemma
genre saami
genre_facet saami
op_source http://emmtee.net/oe/nodalida13/conference/45.pdf
op_relation http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.658.4282
http://emmtee.net/oe/nodalida13/conference/45.pdf
op_rights Metadata may be used without restrictions as long as the oai identifier remains attached to it.
_version_ 1766180502862036992