WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration
The problem of word sense disambiguation (WSD) is considered in the article. Given a set of synonyms (synsets) and sentences with these synonyms. It is necessary to select the meaning of the word in the sentence automatically. 1285 sentences were tagged by experts, namely, one of the dictionary mean...
Main Authors: | , , |
---|---|
Format: | Text |
Language: | unknown |
Published: |
arXiv
2018
|
Subjects: | |
Online Access: | https://dx.doi.org/10.48550/arxiv.1805.09559 https://arxiv.org/abs/1805.09559 |
id |
ftdatacite:10.48550/arxiv.1805.09559 |
---|---|
record_format |
openpolar |
spelling |
ftdatacite:10.48550/arxiv.1805.09559 2023-05-15T17:01:35+02:00 WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration Kirillov, Alexander Krizhanovsky, Natalia Krizhanovsky, Andrew 2018 https://dx.doi.org/10.48550/arxiv.1805.09559 https://arxiv.org/abs/1805.09559 unknown arXiv https://dx.doi.org/10.17076/mat829 Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 CC-BY Information Retrieval cs.IR Computation and Language cs.CL FOS Computer and information sciences I.5.3; H.3.1; H.3.3 68T50 article-journal Article ScholarlyArticle Text 2018 ftdatacite https://doi.org/10.48550/arxiv.1805.09559 https://doi.org/10.17076/mat829 2022-04-01T09:35:41Z The problem of word sense disambiguation (WSD) is considered in the article. Given a set of synonyms (synsets) and sentences with these synonyms. It is necessary to select the meaning of the word in the sentence automatically. 1285 sentences were tagged by experts, namely, one of the dictionary meanings was selected by experts for target words. To solve the WSD-problem, an algorithm based on a new method of vector-word contexts proximity calculation is proposed. In order to achieve higher accuracy, a preliminary epsilon-filtering of words is performed, both in the sentence and in the set of synonyms. An extensive program of experiments was carried out. Four algorithms are implemented, including a new algorithm. Experiments have shown that in a number of cases the new algorithm shows better results. The developed software and the tagged corpus have an open license and are available online. Wiktionary and Wikisource are used. A brief description of this work can be viewed in slides (https://goo.gl/9ak6Gt). Video lecture in Russian on this research is available online (https://youtu.be/-DLmRkepf58). : 15 pages, 1 table, 15 figures, accepted in the journal Transactions of Karelian Research Centre of the Russian Academy of Sciences Text karelian DataCite Metadata Store (German National Library of Science and Technology) |
institution |
Open Polar |
collection |
DataCite Metadata Store (German National Library of Science and Technology) |
op_collection_id |
ftdatacite |
language |
unknown |
topic |
Information Retrieval cs.IR Computation and Language cs.CL FOS Computer and information sciences I.5.3; H.3.1; H.3.3 68T50 |
spellingShingle |
Information Retrieval cs.IR Computation and Language cs.CL FOS Computer and information sciences I.5.3; H.3.1; H.3.3 68T50 Kirillov, Alexander Krizhanovsky, Natalia Krizhanovsky, Andrew WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration |
topic_facet |
Information Retrieval cs.IR Computation and Language cs.CL FOS Computer and information sciences I.5.3; H.3.1; H.3.3 68T50 |
description |
The problem of word sense disambiguation (WSD) is considered in the article. Given a set of synonyms (synsets) and sentences with these synonyms. It is necessary to select the meaning of the word in the sentence automatically. 1285 sentences were tagged by experts, namely, one of the dictionary meanings was selected by experts for target words. To solve the WSD-problem, an algorithm based on a new method of vector-word contexts proximity calculation is proposed. In order to achieve higher accuracy, a preliminary epsilon-filtering of words is performed, both in the sentence and in the set of synonyms. An extensive program of experiments was carried out. Four algorithms are implemented, including a new algorithm. Experiments have shown that in a number of cases the new algorithm shows better results. The developed software and the tagged corpus have an open license and are available online. Wiktionary and Wikisource are used. A brief description of this work can be viewed in slides (https://goo.gl/9ak6Gt). Video lecture in Russian on this research is available online (https://youtu.be/-DLmRkepf58). : 15 pages, 1 table, 15 figures, accepted in the journal Transactions of Karelian Research Centre of the Russian Academy of Sciences |
format |
Text |
author |
Kirillov, Alexander Krizhanovsky, Natalia Krizhanovsky, Andrew |
author_facet |
Kirillov, Alexander Krizhanovsky, Natalia Krizhanovsky, Andrew |
author_sort |
Kirillov, Alexander |
title |
WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration |
title_short |
WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration |
title_full |
WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration |
title_fullStr |
WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration |
title_full_unstemmed |
WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration |
title_sort |
wsd algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration |
publisher |
arXiv |
publishDate |
2018 |
url |
https://dx.doi.org/10.48550/arxiv.1805.09559 https://arxiv.org/abs/1805.09559 |
genre |
karelian |
genre_facet |
karelian |
op_relation |
https://dx.doi.org/10.17076/mat829 |
op_rights |
Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 |
op_rightsnorm |
CC-BY |
op_doi |
https://doi.org/10.48550/arxiv.1805.09559 https://doi.org/10.17076/mat829 |
_version_ |
1766054695697121280 |