Synchronized Mediawiki based analyzer dictionary development

Open-source analyzer dictionary development is being implemented for Skolt Sami, Ingrian, Moksha-Mordvin, etc. in the Helsinki CSC infrastructure; home of the Finnish Kielipankki ’Language Bank’ and Termipankki ’Term Bank’. The proximity of minority-language corpora in need of annotation and the mul...

Full description

Bibliographic Details
Published in:Proceedings of the Third Workshop on Computational Linguistics for Uralic Languages
Main Authors: Rueter, Jack, Hämäläinen, Mika
Other Authors: Tyers, Francis M., Rießler, Michael, Pirinen , Tommi A., Trosterud , Trond, Department of Modern Languages 2010-2017, Language Technology, Department of Computer Science
Format: Conference Object
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10138/232470
id ftunivhelsihelda:oai:helda.helsinki.fi:10138/232470
record_format openpolar
spelling ftunivhelsihelda:oai:helda.helsinki.fi:10138/232470 2024-01-07T09:46:25+01:00 Synchronized Mediawiki based analyzer dictionary development Rueter, Jack Hämäläinen, Mika Tyers, Francis M. Rießler, Michael Pirinen , Tommi A. Trosterud , Trond Department of Modern Languages 2010-2017 Language Technology Department of Computer Science 2018-02-15T09:32:00Z 7 application/pdf http://hdl.handle.net/10138/232470 eng eng 10.18653/v1/w17-0601 3rd International Workshop for Computational Linguistics of Uralic Languages (IWCLUL 2017) 978-1-5108-3665-5 Rueter , J & Hämäläinen , M 2017 , Synchronized Mediawiki based analyzer dictionary development . in F M Tyers , M Rießler , T A Pirinen & T Trosterud (eds) , 3rd International Workshop for Computational Linguistics of Uralic Languages (IWCLUL 2017) : St. Petersburg, Russia 23 – 24 January 2017 . , 2 , The Association for Computational Linguistics , Stroudsburg , pp. 1-7 , International Workshop for Computational Linguistics of Uralic Languages , St. Petersburg , Russian Federation , 23/01/2017 . https://doi.org/10.18653/v1/w17-0601 conference ORCID: /0000-0001-9315-1278/work/41848420 ORCID: /0000-0002-3076-7929/work/61899960 263590f5-c061-4385-a3fe-982ba43fd84e http://hdl.handle.net/10138/232470 cc_by openAccess info:eu-repo/semantics/openAccess 6121 Languages Open-source Analyzer dictionary development Wiki-based dictionary Synchronized dictionary editing Uralic Languages Semantics Morphology Morpho-syntactic data Etymology Conference contribution publishedVersion 2018 ftunivhelsihelda 2023-12-14T00:12:57Z Open-source analyzer dictionary development is being implemented for Skolt Sami, Ingrian, Moksha-Mordvin, etc. in the Helsinki CSC infrastructure; home of the Finnish Kielipankki ’Language Bank’ and Termipankki ’Term Bank’. The proximity of minority-language corpora in need of annotation and the multiple usage of controlled wikimedia-type dictionaries make CSC an attractive site for synchronized transducer dictionary development. The open-source FST develop- ment of Uralic and other minority languages at Giellatekno-Divvun in Tromsø demonstrates a vast potential for reusage of FST-s, only augmented by open- source work in OmorFi, Apertium and Universal Dependency <http://univer- saldependencies.org/#language-urj>. The initial idea is to allow synchronized editing of Giellatekno xml and CSC wiki structures via github. In addition to allowing for simple lexc LEMMA:STEM CONTINUATION_LEXICON ”TRANS- LATION” line exports, the parallel dictionaries will provide for documentation of derivation, morpho-syntactic information on valency and government, seman- tics and etymology. Peer reviewed Conference Object sami Tromsø HELDA – University of Helsinki Open Repository Tromsø Proceedings of the Third Workshop on Computational Linguistics for Uralic Languages 1 7
institution Open Polar
collection HELDA – University of Helsinki Open Repository
op_collection_id ftunivhelsihelda
language English
topic 6121 Languages
Open-source
Analyzer dictionary development
Wiki-based dictionary
Synchronized dictionary editing
Uralic Languages
Semantics
Morphology
Morpho-syntactic data
Etymology
spellingShingle 6121 Languages
Open-source
Analyzer dictionary development
Wiki-based dictionary
Synchronized dictionary editing
Uralic Languages
Semantics
Morphology
Morpho-syntactic data
Etymology
Rueter, Jack
Hämäläinen, Mika
Synchronized Mediawiki based analyzer dictionary development
topic_facet 6121 Languages
Open-source
Analyzer dictionary development
Wiki-based dictionary
Synchronized dictionary editing
Uralic Languages
Semantics
Morphology
Morpho-syntactic data
Etymology
description Open-source analyzer dictionary development is being implemented for Skolt Sami, Ingrian, Moksha-Mordvin, etc. in the Helsinki CSC infrastructure; home of the Finnish Kielipankki ’Language Bank’ and Termipankki ’Term Bank’. The proximity of minority-language corpora in need of annotation and the multiple usage of controlled wikimedia-type dictionaries make CSC an attractive site for synchronized transducer dictionary development. The open-source FST develop- ment of Uralic and other minority languages at Giellatekno-Divvun in Tromsø demonstrates a vast potential for reusage of FST-s, only augmented by open- source work in OmorFi, Apertium and Universal Dependency <http://univer- saldependencies.org/#language-urj>. The initial idea is to allow synchronized editing of Giellatekno xml and CSC wiki structures via github. In addition to allowing for simple lexc LEMMA:STEM CONTINUATION_LEXICON ”TRANS- LATION” line exports, the parallel dictionaries will provide for documentation of derivation, morpho-syntactic information on valency and government, seman- tics and etymology. Peer reviewed
author2 Tyers, Francis M.
Rießler, Michael
Pirinen , Tommi A.
Trosterud , Trond
Department of Modern Languages 2010-2017
Language Technology
Department of Computer Science
format Conference Object
author Rueter, Jack
Hämäläinen, Mika
author_facet Rueter, Jack
Hämäläinen, Mika
author_sort Rueter, Jack
title Synchronized Mediawiki based analyzer dictionary development
title_short Synchronized Mediawiki based analyzer dictionary development
title_full Synchronized Mediawiki based analyzer dictionary development
title_fullStr Synchronized Mediawiki based analyzer dictionary development
title_full_unstemmed Synchronized Mediawiki based analyzer dictionary development
title_sort synchronized mediawiki based analyzer dictionary development
publishDate 2018
url http://hdl.handle.net/10138/232470
geographic Tromsø
geographic_facet Tromsø
genre sami
Tromsø
genre_facet sami
Tromsø
op_relation 10.18653/v1/w17-0601
3rd International Workshop for Computational Linguistics of Uralic Languages (IWCLUL 2017)
978-1-5108-3665-5
Rueter , J & Hämäläinen , M 2017 , Synchronized Mediawiki based analyzer dictionary development . in F M Tyers , M Rießler , T A Pirinen & T Trosterud (eds) , 3rd International Workshop for Computational Linguistics of Uralic Languages (IWCLUL 2017) : St. Petersburg, Russia 23 – 24 January 2017 . , 2 , The Association for Computational Linguistics , Stroudsburg , pp. 1-7 , International Workshop for Computational Linguistics of Uralic Languages , St. Petersburg , Russian Federation , 23/01/2017 . https://doi.org/10.18653/v1/w17-0601
conference
ORCID: /0000-0001-9315-1278/work/41848420
ORCID: /0000-0002-3076-7929/work/61899960
263590f5-c061-4385-a3fe-982ba43fd84e
http://hdl.handle.net/10138/232470
op_rights cc_by
openAccess
info:eu-repo/semantics/openAccess
container_title Proceedings of the Third Workshop on Computational Linguistics for Uralic Languages
container_start_page 1
op_container_end_page 7
_version_ 1787428202664165376