IceNLP: A Natural Language Processing Toolkit for Icelandic
Icelandic is a morphologically complex language, for which language technology resources are scarce. Only a few years ago, it could be stated that language technology was practically non-existent in Iceland. In this paper, we describe the development of an NLP toolkit for processing the language, th...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Text |
Language: | English |
Subjects: | |
Online Access: | http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.330.5629 http://www.ru.is/faculty/hrafn/Papers/IceNLP_final2.pdf |
Summary: | Icelandic is a morphologically complex language, for which language technology resources are scarce. Only a few years ago, it could be stated that language technology was practically non-existent in Iceland. In this paper, we describe the development of an NLP toolkit for processing the language, the challenges faced and the decisions made during development. The current version of the toolkit consists of a tokeniser/sentence segmentiser, a morphological analyser, a linguistic rule-based tagger, and a finite-state parser. The development of our toolkit is a step towards building a Basic Language Resource Toolkit (BLARK) for the Icelandic language. Index Terms: morphological analysis, part-of-speech tagging, finite-state parsing |
---|