CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings

Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts in 45 languages, generated by UDPipe (http://ufal.mff.cuni.cz/udpipe), together with word embeddings of dimension 100 computed from lowercased texts by word2vec (https://code.google.com/archive/p/word2vec/)...

Full description

Bibliographic Details
Main Authors: Ginter, Filip, Hajič, Jan, Luotolahti, Juhani, Straka, Milan, Zeman, Daniel
Format: Book
Language:unknown
Published: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) 2017
Subjects:
Online Access:http://hdl.handle.net/11234/1-1989
id ftolac:oai:lindat.mff.cuni.cz:11234/1-1989
record_format openpolar
spelling ftolac:oai:lindat.mff.cuni.cz:11234/1-1989 2023-05-15T18:12:08+02:00 CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings Ginter, Filip Hajič, Jan Luotolahti, Juhani Straka, Milan Zeman, Daniel 2017-03-16T11:57:32Z http://hdl.handle.net/11234/1-1989 mul Multiple languages unknown Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) http://hdl.handle.net/11234/1-1989 Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) http://creativecommons.org/licenses/by-nc-sa/4.0/ CC-BY-NC-SA CoNLL 2017 word embeddings automatic annotation Multiple languages languageDescription Text Linguistic type: language description 2017 ftolac 2021-07-01T14:43:08Z Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts in 45 languages, generated by UDPipe (http://ufal.mff.cuni.cz/udpipe), together with word embeddings of dimension 100 computed from lowercased texts by word2vec (https://code.google.com/archive/p/word2vec/). For each language, automatic annotations in CoNLL-U format are provided in a separate archive. The word embeddings for all languages are distributed in one archive. Note that the CC BY-SA-NC 4.0 license applies to the automatically generated annotations and word embeddings, not to the underlying data, which may have different license and impose additional restrictions. Update 2018-09-03 =============== Added data in the 4 “surprise languages” from the 2017 ST: Buryat, Kurmanji, North Sami and Upper Sorbian. This has been promised before, during CoNLL-ST 2018 we gave the participants a link to this record saying the data was here. It wasn't, sorry. But now it is. Book sami OLAC: Open Language Archives Community
institution Open Polar
collection OLAC: Open Language Archives Community
op_collection_id ftolac
language unknown
topic CoNLL 2017
word embeddings
automatic annotation
Multiple languages
spellingShingle CoNLL 2017
word embeddings
automatic annotation
Multiple languages
Ginter, Filip
Hajič, Jan
Luotolahti, Juhani
Straka, Milan
Zeman, Daniel
CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings
topic_facet CoNLL 2017
word embeddings
automatic annotation
Multiple languages
description Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts in 45 languages, generated by UDPipe (http://ufal.mff.cuni.cz/udpipe), together with word embeddings of dimension 100 computed from lowercased texts by word2vec (https://code.google.com/archive/p/word2vec/). For each language, automatic annotations in CoNLL-U format are provided in a separate archive. The word embeddings for all languages are distributed in one archive. Note that the CC BY-SA-NC 4.0 license applies to the automatically generated annotations and word embeddings, not to the underlying data, which may have different license and impose additional restrictions. Update 2018-09-03 =============== Added data in the 4 “surprise languages” from the 2017 ST: Buryat, Kurmanji, North Sami and Upper Sorbian. This has been promised before, during CoNLL-ST 2018 we gave the participants a link to this record saying the data was here. It wasn't, sorry. But now it is.
format Book
author Ginter, Filip
Hajič, Jan
Luotolahti, Juhani
Straka, Milan
Zeman, Daniel
author_facet Ginter, Filip
Hajič, Jan
Luotolahti, Juhani
Straka, Milan
Zeman, Daniel
author_sort Ginter, Filip
title CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings
title_short CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings
title_full CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings
title_fullStr CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings
title_full_unstemmed CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings
title_sort conll 2017 shared task - automatically annotated raw texts and word embeddings
publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
publishDate 2017
url http://hdl.handle.net/11234/1-1989
genre sami
genre_facet sami
op_relation http://hdl.handle.net/11234/1-1989
op_rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
http://creativecommons.org/licenses/by-nc-sa/4.0/
op_rightsnorm CC-BY-NC-SA
_version_ 1766184687709978624