Data for "SuperSim: a test set for word similarity and relatedness in Swedish"

This repository contains the data described in SuperSim: a test set for word similarity and relatedness in Swedish (Hengchen and Tahmasebi, 2021) available at https://aclanthology.org/2021.nodalida-main.27/ . If you use part or whole of this resource, please cite the following work or alternatively...

Full description

Bibliographic Details
Main Authors: Simon Hengchen, Nina Tahmasebi
Format: Dataset
Language:Swedish
Published: 2021
Subjects:
Online Access:https://zenodo.org/record/4660084
https://doi.org/10.5281/zenodo.4660084
Description
Summary:This repository contains the data described in SuperSim: a test set for word similarity and relatedness in Swedish (Hengchen and Tahmasebi, 2021) available at https://aclanthology.org/2021.nodalida-main.27/ . If you use part or whole of this resource, please cite the following work or alternatively use the bibtex entry: Hengchen, Simon and Tahmasebi, Nina, 2021. SuperSim: a test set for word similarity and relatedness in Swedish. In The 23rd Nordic Conference on Computational Linguistics (NoDaLiDa’21). @inproceedings{hengchen-tahmasebi-2021-supersim, title = "{SuperSim:} a test set for word similarity and relatedness in {Swedish}", author = "Hengchen, Simon and Tahmasebi, Nina", booktitle = "Proceedings of the 23rd Nordic Conference on Computational Linguistics", month = may # "{--}" # jun, year = "2021", address = "Reykjavik, Iceland, and Online", publisher = {Link{\"o}ping University Electronic Press}, } The data contained in this repository is as follows: The code folder contains: main.py utils.py train_base_models.py perl-clean.pl requirements.txt The data folder contains: gold_relatedness.tsv: all relatedness judgments from all annotators, as well as the mean gold_similarity.tsv: all similarity judgments from all annotators, as well as the mean models contains baseline models: Trained on the Swedish Gigaword: FastText: gigaword_sv.ft (and gigaword_sv.ft.trainables.syn1neg.npy, gigaword_sv.ft.trainables.vectors_ngrams_lockf.npy, gigaword_sv.ft.trainables.vectors_vocab_lockf.npy, gigaword_sv.ft.wv.vectors_ngrams.npy, gigaword_sv.ft.wv.vectors_vocab.npy, gigaword_sv.ft.wv.vectors.npy) Word2Vec: gigaword_sv.w2v (and gigaword_sv.w2v.trainables.syn1neg.npy, gigaword_sv.w2v.wv.vectors.npy) GloVe: glove_vectors_giga.txt and glove_vocab_giga.txt Trained on Swedish Wikipedia: FastText: wiki_sv.ft (and wiki_sv.ft.trainables.syn1neg.npy, wiki_sv.ft.trainables.vectors_ngrams_lockf.npy, wiki_sv.ft.trainables.vectors_vocab_lockf.npy, wiki_sv.ft.wv.vectors_ngrams.npy, wiki_sv.ft.wv.vectors.npy, ...