Synchronized Mediawiki based analyzer dictionary development
Open-source analyzer dictionary development is being implemented for Skolt Sami, Ingrian, Moksha-Mordvin, etc. in the Helsinki CSC infrastructure; home of the Finnish Kielipankki ’Language Bank’ and Termipankki ’Term Bank’. The proximity of minority-language corpora in need of annotation and the mul...
Published in: | Proceedings of the Third Workshop on Computational Linguistics for Uralic Languages |
---|---|
Main Authors: | , |
Other Authors: | , , , , , , |
Format: | Conference Object |
Language: | English |
Published: |
2018
|
Subjects: | |
Online Access: | http://hdl.handle.net/10138/232470 |
id |
ftunivhelsihelda:oai:helda.helsinki.fi:10138/232470 |
---|---|
record_format |
openpolar |
spelling |
ftunivhelsihelda:oai:helda.helsinki.fi:10138/232470 2024-01-07T09:46:25+01:00 Synchronized Mediawiki based analyzer dictionary development Rueter, Jack Hämäläinen, Mika Tyers, Francis M. Rießler, Michael Pirinen , Tommi A. Trosterud , Trond Department of Modern Languages 2010-2017 Language Technology Department of Computer Science 2018-02-15T09:32:00Z 7 application/pdf http://hdl.handle.net/10138/232470 eng eng 10.18653/v1/w17-0601 3rd International Workshop for Computational Linguistics of Uralic Languages (IWCLUL 2017) 978-1-5108-3665-5 Rueter , J & Hämäläinen , M 2017 , Synchronized Mediawiki based analyzer dictionary development . in F M Tyers , M Rießler , T A Pirinen & T Trosterud (eds) , 3rd International Workshop for Computational Linguistics of Uralic Languages (IWCLUL 2017) : St. Petersburg, Russia 23 – 24 January 2017 . , 2 , The Association for Computational Linguistics , Stroudsburg , pp. 1-7 , International Workshop for Computational Linguistics of Uralic Languages , St. Petersburg , Russian Federation , 23/01/2017 . https://doi.org/10.18653/v1/w17-0601 conference ORCID: /0000-0001-9315-1278/work/41848420 ORCID: /0000-0002-3076-7929/work/61899960 263590f5-c061-4385-a3fe-982ba43fd84e http://hdl.handle.net/10138/232470 cc_by openAccess info:eu-repo/semantics/openAccess 6121 Languages Open-source Analyzer dictionary development Wiki-based dictionary Synchronized dictionary editing Uralic Languages Semantics Morphology Morpho-syntactic data Etymology Conference contribution publishedVersion 2018 ftunivhelsihelda 2023-12-14T00:12:57Z Open-source analyzer dictionary development is being implemented for Skolt Sami, Ingrian, Moksha-Mordvin, etc. in the Helsinki CSC infrastructure; home of the Finnish Kielipankki ’Language Bank’ and Termipankki ’Term Bank’. The proximity of minority-language corpora in need of annotation and the multiple usage of controlled wikimedia-type dictionaries make CSC an attractive site for synchronized transducer dictionary development. The open-source FST develop- ment of Uralic and other minority languages at Giellatekno-Divvun in Tromsø demonstrates a vast potential for reusage of FST-s, only augmented by open- source work in OmorFi, Apertium and Universal Dependency <http://univer- saldependencies.org/#language-urj>. The initial idea is to allow synchronized editing of Giellatekno xml and CSC wiki structures via github. In addition to allowing for simple lexc LEMMA:STEM CONTINUATION_LEXICON ”TRANS- LATION” line exports, the parallel dictionaries will provide for documentation of derivation, morpho-syntactic information on valency and government, seman- tics and etymology. Peer reviewed Conference Object sami Tromsø HELDA – University of Helsinki Open Repository Tromsø Proceedings of the Third Workshop on Computational Linguistics for Uralic Languages 1 7 |
institution |
Open Polar |
collection |
HELDA – University of Helsinki Open Repository |
op_collection_id |
ftunivhelsihelda |
language |
English |
topic |
6121 Languages Open-source Analyzer dictionary development Wiki-based dictionary Synchronized dictionary editing Uralic Languages Semantics Morphology Morpho-syntactic data Etymology |
spellingShingle |
6121 Languages Open-source Analyzer dictionary development Wiki-based dictionary Synchronized dictionary editing Uralic Languages Semantics Morphology Morpho-syntactic data Etymology Rueter, Jack Hämäläinen, Mika Synchronized Mediawiki based analyzer dictionary development |
topic_facet |
6121 Languages Open-source Analyzer dictionary development Wiki-based dictionary Synchronized dictionary editing Uralic Languages Semantics Morphology Morpho-syntactic data Etymology |
description |
Open-source analyzer dictionary development is being implemented for Skolt Sami, Ingrian, Moksha-Mordvin, etc. in the Helsinki CSC infrastructure; home of the Finnish Kielipankki ’Language Bank’ and Termipankki ’Term Bank’. The proximity of minority-language corpora in need of annotation and the multiple usage of controlled wikimedia-type dictionaries make CSC an attractive site for synchronized transducer dictionary development. The open-source FST develop- ment of Uralic and other minority languages at Giellatekno-Divvun in Tromsø demonstrates a vast potential for reusage of FST-s, only augmented by open- source work in OmorFi, Apertium and Universal Dependency <http://univer- saldependencies.org/#language-urj>. The initial idea is to allow synchronized editing of Giellatekno xml and CSC wiki structures via github. In addition to allowing for simple lexc LEMMA:STEM CONTINUATION_LEXICON ”TRANS- LATION” line exports, the parallel dictionaries will provide for documentation of derivation, morpho-syntactic information on valency and government, seman- tics and etymology. Peer reviewed |
author2 |
Tyers, Francis M. Rießler, Michael Pirinen , Tommi A. Trosterud , Trond Department of Modern Languages 2010-2017 Language Technology Department of Computer Science |
format |
Conference Object |
author |
Rueter, Jack Hämäläinen, Mika |
author_facet |
Rueter, Jack Hämäläinen, Mika |
author_sort |
Rueter, Jack |
title |
Synchronized Mediawiki based analyzer dictionary development |
title_short |
Synchronized Mediawiki based analyzer dictionary development |
title_full |
Synchronized Mediawiki based analyzer dictionary development |
title_fullStr |
Synchronized Mediawiki based analyzer dictionary development |
title_full_unstemmed |
Synchronized Mediawiki based analyzer dictionary development |
title_sort |
synchronized mediawiki based analyzer dictionary development |
publishDate |
2018 |
url |
http://hdl.handle.net/10138/232470 |
geographic |
Tromsø |
geographic_facet |
Tromsø |
genre |
sami Tromsø |
genre_facet |
sami Tromsø |
op_relation |
10.18653/v1/w17-0601 3rd International Workshop for Computational Linguistics of Uralic Languages (IWCLUL 2017) 978-1-5108-3665-5 Rueter , J & Hämäläinen , M 2017 , Synchronized Mediawiki based analyzer dictionary development . in F M Tyers , M Rießler , T A Pirinen & T Trosterud (eds) , 3rd International Workshop for Computational Linguistics of Uralic Languages (IWCLUL 2017) : St. Petersburg, Russia 23 – 24 January 2017 . , 2 , The Association for Computational Linguistics , Stroudsburg , pp. 1-7 , International Workshop for Computational Linguistics of Uralic Languages , St. Petersburg , Russian Federation , 23/01/2017 . https://doi.org/10.18653/v1/w17-0601 conference ORCID: /0000-0001-9315-1278/work/41848420 ORCID: /0000-0002-3076-7929/work/61899960 263590f5-c061-4385-a3fe-982ba43fd84e http://hdl.handle.net/10138/232470 |
op_rights |
cc_by openAccess info:eu-repo/semantics/openAccess |
container_title |
Proceedings of the Third Workshop on Computational Linguistics for
Uralic Languages |
container_start_page |
1 |
op_container_end_page |
7 |
_version_ |
1787428202664165376 |