WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration

The problem of word sense disambiguation (WSD) is considered in the article. Given a set of synonyms (synsets) and sentences with these synonyms. It is necessary to select the meaning of the word in the sentence automatically. 1285 sentences were tagged by experts, namely, one of the dictionary mean...

Full description

Bibliographic Details
Published in:Proceedings of the Karelian Research Centre of the Russian Academy of Sciences
Main Authors: Kirillov, Alexander, Krizhanovsky, Natalia, Krizhanovsky, Andrew
Format: Text
Language:unknown
Published: 2018
Subjects:
Online Access:http://arxiv.org/abs/1805.09559
https://doi.org/10.17076/mat829
id ftarxivpreprints:oai:arXiv.org:1805.09559
record_format openpolar
spelling ftarxivpreprints:oai:arXiv.org:1805.09559 2023-09-05T13:20:48+02:00 WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration Kirillov, Alexander Krizhanovsky, Natalia Krizhanovsky, Andrew 2018-05-24 http://arxiv.org/abs/1805.09559 https://doi.org/10.17076/mat829 unknown http://arxiv.org/abs/1805.09559 doi:10.17076/mat829 Computer Science - Information Retrieval Computer Science - Computation and Language 68T50 I.5.3 H.3.1 H.3.3 text 2018 ftarxivpreprints https://doi.org/10.17076/mat829 2023-08-16T14:51:32Z The problem of word sense disambiguation (WSD) is considered in the article. Given a set of synonyms (synsets) and sentences with these synonyms. It is necessary to select the meaning of the word in the sentence automatically. 1285 sentences were tagged by experts, namely, one of the dictionary meanings was selected by experts for target words. To solve the WSD-problem, an algorithm based on a new method of vector-word contexts proximity calculation is proposed. In order to achieve higher accuracy, a preliminary epsilon-filtering of words is performed, both in the sentence and in the set of synonyms. An extensive program of experiments was carried out. Four algorithms are implemented, including a new algorithm. Experiments have shown that in a number of cases the new algorithm shows better results. The developed software and the tagged corpus have an open license and are available online. Wiktionary and Wikisource are used. A brief description of this work can be viewed in slides (https://goo.gl/9ak6Gt). Video lecture in Russian on this research is available online (https://youtu.be/-DLmRkepf58). Comment: 15 pages, 1 table, 15 figures, accepted in the journal Transactions of Karelian Research Centre of the Russian Academy of Sciences Text karelian ArXiv.org (Cornell University Library) Proceedings of the Karelian Research Centre of the Russian Academy of Sciences 7 149
institution Open Polar
collection ArXiv.org (Cornell University Library)
op_collection_id ftarxivpreprints
language unknown
topic Computer Science - Information Retrieval
Computer Science - Computation and Language
68T50
I.5.3
H.3.1
H.3.3
spellingShingle Computer Science - Information Retrieval
Computer Science - Computation and Language
68T50
I.5.3
H.3.1
H.3.3
Kirillov, Alexander
Krizhanovsky, Natalia
Krizhanovsky, Andrew
WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration
topic_facet Computer Science - Information Retrieval
Computer Science - Computation and Language
68T50
I.5.3
H.3.1
H.3.3
description The problem of word sense disambiguation (WSD) is considered in the article. Given a set of synonyms (synsets) and sentences with these synonyms. It is necessary to select the meaning of the word in the sentence automatically. 1285 sentences were tagged by experts, namely, one of the dictionary meanings was selected by experts for target words. To solve the WSD-problem, an algorithm based on a new method of vector-word contexts proximity calculation is proposed. In order to achieve higher accuracy, a preliminary epsilon-filtering of words is performed, both in the sentence and in the set of synonyms. An extensive program of experiments was carried out. Four algorithms are implemented, including a new algorithm. Experiments have shown that in a number of cases the new algorithm shows better results. The developed software and the tagged corpus have an open license and are available online. Wiktionary and Wikisource are used. A brief description of this work can be viewed in slides (https://goo.gl/9ak6Gt). Video lecture in Russian on this research is available online (https://youtu.be/-DLmRkepf58). Comment: 15 pages, 1 table, 15 figures, accepted in the journal Transactions of Karelian Research Centre of the Russian Academy of Sciences
format Text
author Kirillov, Alexander
Krizhanovsky, Natalia
Krizhanovsky, Andrew
author_facet Kirillov, Alexander
Krizhanovsky, Natalia
Krizhanovsky, Andrew
author_sort Kirillov, Alexander
title WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration
title_short WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration
title_full WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration
title_fullStr WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration
title_full_unstemmed WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration
title_sort wsd algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration
publishDate 2018
url http://arxiv.org/abs/1805.09559
https://doi.org/10.17076/mat829
genre karelian
genre_facet karelian
op_relation http://arxiv.org/abs/1805.09559
doi:10.17076/mat829
op_doi https://doi.org/10.17076/mat829
container_title Proceedings of the Karelian Research Centre of the Russian Academy of Sciences
container_issue 7
container_start_page 149
_version_ 1776201434952368128