Synchronized Mediawiki based analyzer dictionary development

Open-source analyzer dictionary development is being implemented for Skolt Sami, Ingrian, Moksha-Mordvin, etc. in the Helsinki CSC infrastructure; home of the Finnish Kielipankki ’Language Bank’ and Termipankki ’Term Bank’. The proximity of minority-language corpora in need of annotation and the mul...

Full description

Bibliographic Details
Published in:Proceedings of the Third Workshop on Computational Linguistics for Uralic Languages
Main Authors: Rueter, Jack, Hämäläinen, Mika
Other Authors: Tyers, Francis M., Rießler, Michael, Pirinen , Tommi A., Trosterud , Trond, Department of Modern Languages 2010-2017, Language Technology, Department of Computer Science
Format: Conference Object
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10138/232470
Description
Summary:Open-source analyzer dictionary development is being implemented for Skolt Sami, Ingrian, Moksha-Mordvin, etc. in the Helsinki CSC infrastructure; home of the Finnish Kielipankki ’Language Bank’ and Termipankki ’Term Bank’. The proximity of minority-language corpora in need of annotation and the multiple usage of controlled wikimedia-type dictionaries make CSC an attractive site for synchronized transducer dictionary development. The open-source FST develop- ment of Uralic and other minority languages at Giellatekno-Divvun in Tromsø demonstrates a vast potential for reusage of FST-s, only augmented by open- source work in OmorFi, Apertium and Universal Dependency <http://univer- saldependencies.org/#language-urj>. The initial idea is to allow synchronized editing of Giellatekno xml and CSC wiki structures via github. In addition to allowing for simple lexc LEMMA:STEM CONTINUATION_LEXICON ”TRANS- LATION” line exports, the parallel dictionaries will provide for documentation of derivation, morpho-syntactic information on valency and government, seman- tics and etymology. Peer reviewed