CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings
Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts in 45 languages, generated by UDPipe (http://ufal.mff.cuni.cz/udpipe), together with word embeddings of dimension 100 computed from lowercased texts by word2vec (https://code.google.com/archive/p/word2vec/)...
Main Authors: | , , , , |
---|---|
Format: | Book |
Language: | unknown |
Published: |
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/11234/1-1989 |
id |
ftolac:oai:lindat.mff.cuni.cz:11234/1-1989 |
---|---|
record_format |
openpolar |
spelling |
ftolac:oai:lindat.mff.cuni.cz:11234/1-1989 2023-05-15T18:12:08+02:00 CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings Ginter, Filip Hajič, Jan Luotolahti, Juhani Straka, Milan Zeman, Daniel 2017-03-16T11:57:32Z http://hdl.handle.net/11234/1-1989 mul Multiple languages unknown Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) http://hdl.handle.net/11234/1-1989 Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) http://creativecommons.org/licenses/by-nc-sa/4.0/ CC-BY-NC-SA CoNLL 2017 word embeddings automatic annotation Multiple languages languageDescription Text Linguistic type: language description 2017 ftolac 2021-07-01T14:43:08Z Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts in 45 languages, generated by UDPipe (http://ufal.mff.cuni.cz/udpipe), together with word embeddings of dimension 100 computed from lowercased texts by word2vec (https://code.google.com/archive/p/word2vec/). For each language, automatic annotations in CoNLL-U format are provided in a separate archive. The word embeddings for all languages are distributed in one archive. Note that the CC BY-SA-NC 4.0 license applies to the automatically generated annotations and word embeddings, not to the underlying data, which may have different license and impose additional restrictions. Update 2018-09-03 =============== Added data in the 4 “surprise languages” from the 2017 ST: Buryat, Kurmanji, North Sami and Upper Sorbian. This has been promised before, during CoNLL-ST 2018 we gave the participants a link to this record saying the data was here. It wasn't, sorry. But now it is. Book sami OLAC: Open Language Archives Community |
institution |
Open Polar |
collection |
OLAC: Open Language Archives Community |
op_collection_id |
ftolac |
language |
unknown |
topic |
CoNLL 2017 word embeddings automatic annotation Multiple languages |
spellingShingle |
CoNLL 2017 word embeddings automatic annotation Multiple languages Ginter, Filip Hajič, Jan Luotolahti, Juhani Straka, Milan Zeman, Daniel CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings |
topic_facet |
CoNLL 2017 word embeddings automatic annotation Multiple languages |
description |
Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts in 45 languages, generated by UDPipe (http://ufal.mff.cuni.cz/udpipe), together with word embeddings of dimension 100 computed from lowercased texts by word2vec (https://code.google.com/archive/p/word2vec/). For each language, automatic annotations in CoNLL-U format are provided in a separate archive. The word embeddings for all languages are distributed in one archive. Note that the CC BY-SA-NC 4.0 license applies to the automatically generated annotations and word embeddings, not to the underlying data, which may have different license and impose additional restrictions. Update 2018-09-03 =============== Added data in the 4 “surprise languages” from the 2017 ST: Buryat, Kurmanji, North Sami and Upper Sorbian. This has been promised before, during CoNLL-ST 2018 we gave the participants a link to this record saying the data was here. It wasn't, sorry. But now it is. |
format |
Book |
author |
Ginter, Filip Hajič, Jan Luotolahti, Juhani Straka, Milan Zeman, Daniel |
author_facet |
Ginter, Filip Hajič, Jan Luotolahti, Juhani Straka, Milan Zeman, Daniel |
author_sort |
Ginter, Filip |
title |
CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings |
title_short |
CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings |
title_full |
CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings |
title_fullStr |
CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings |
title_full_unstemmed |
CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings |
title_sort |
conll 2017 shared task - automatically annotated raw texts and word embeddings |
publisher |
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
publishDate |
2017 |
url |
http://hdl.handle.net/11234/1-1989 |
genre |
sami |
genre_facet |
sami |
op_relation |
http://hdl.handle.net/11234/1-1989 |
op_rights |
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) http://creativecommons.org/licenses/by-nc-sa/4.0/ |
op_rightsnorm |
CC-BY-NC-SA |
_version_ |
1766184687709978624 |