Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation
Concept extraction is crucial for a number of downstream applications. However, surprisingly enough, straightforward single token/nominal chunk–concept alignment or dictionary lookup techniques such as DBpedia Spotlight still prevail. We propose a generic open domain-oriented extractive model that i...
Main Authors: | , |
---|---|
Format: | Conference Object |
Language: | English |
Published: |
Zenodo
2021
|
Subjects: | |
Online Access: | https://dx.doi.org/10.5281/zenodo.4529274 https://zenodo.org/record/4529274 |
id |
ftdatacite:10.5281/zenodo.4529274 |
---|---|
record_format |
openpolar |
spelling |
ftdatacite:10.5281/zenodo.4529274 2023-05-15T18:32:41+02:00 Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation Shvets, Alexander Wanner, Leo 2021 https://dx.doi.org/10.5281/zenodo.4529274 https://zenodo.org/record/4529274 en eng Zenodo https://zenodo.org/communities/connexions-h2020 https://dx.doi.org/10.5281/zenodo.4529273 https://zenodo.org/communities/connexions-h2020 Open Access Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 info:eu-repo/semantics/openAccess CC-BY Concept extraction Open-domain discourse texts Pointer-generator neural network Distant supervision Text Conference paper article-journal ScholarlyArticle 2021 ftdatacite https://doi.org/10.5281/zenodo.4529274 https://doi.org/10.5281/zenodo.4529273 2021-11-05T12:55:41Z Concept extraction is crucial for a number of downstream applications. However, surprisingly enough, straightforward single token/nominal chunk–concept alignment or dictionary lookup techniques such as DBpedia Spotlight still prevail. We propose a generic open domain-oriented extractive model that is based on distant supervision of a pointer–generator network leveraging bidirectional LSTMs and a copy mechanism and that is able to cope with the out-of-vocabulary phenomenon. The model has been trained on a large annotated corpus compiled specifically for this task from 250K Wikipedia pages, and tested on regular pages, where the pointers to other pages are considered as ground truth concepts. The outcome of the experiments shows that our model significantly outperforms standard techniques and, when used on top of DBpedia Spotlight, further improves its performance. The experiments furthermore show that the model can be readily ported to other datasets on which it equally achieves a state-of-the-art performance. Conference Object The Pointers DataCite Metadata Store (German National Library of Science and Technology) |
institution |
Open Polar |
collection |
DataCite Metadata Store (German National Library of Science and Technology) |
op_collection_id |
ftdatacite |
language |
English |
topic |
Concept extraction Open-domain discourse texts Pointer-generator neural network Distant supervision |
spellingShingle |
Concept extraction Open-domain discourse texts Pointer-generator neural network Distant supervision Shvets, Alexander Wanner, Leo Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation |
topic_facet |
Concept extraction Open-domain discourse texts Pointer-generator neural network Distant supervision |
description |
Concept extraction is crucial for a number of downstream applications. However, surprisingly enough, straightforward single token/nominal chunk–concept alignment or dictionary lookup techniques such as DBpedia Spotlight still prevail. We propose a generic open domain-oriented extractive model that is based on distant supervision of a pointer–generator network leveraging bidirectional LSTMs and a copy mechanism and that is able to cope with the out-of-vocabulary phenomenon. The model has been trained on a large annotated corpus compiled specifically for this task from 250K Wikipedia pages, and tested on regular pages, where the pointers to other pages are considered as ground truth concepts. The outcome of the experiments shows that our model significantly outperforms standard techniques and, when used on top of DBpedia Spotlight, further improves its performance. The experiments furthermore show that the model can be readily ported to other datasets on which it equally achieves a state-of-the-art performance. |
format |
Conference Object |
author |
Shvets, Alexander Wanner, Leo |
author_facet |
Shvets, Alexander Wanner, Leo |
author_sort |
Shvets, Alexander |
title |
Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation |
title_short |
Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation |
title_full |
Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation |
title_fullStr |
Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation |
title_full_unstemmed |
Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation |
title_sort |
concept extraction using pointer–generator networks and distant supervision for data augmentation |
publisher |
Zenodo |
publishDate |
2021 |
url |
https://dx.doi.org/10.5281/zenodo.4529274 https://zenodo.org/record/4529274 |
genre |
The Pointers |
genre_facet |
The Pointers |
op_relation |
https://zenodo.org/communities/connexions-h2020 https://dx.doi.org/10.5281/zenodo.4529273 https://zenodo.org/communities/connexions-h2020 |
op_rights |
Open Access Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 info:eu-repo/semantics/openAccess |
op_rightsnorm |
CC-BY |
op_doi |
https://doi.org/10.5281/zenodo.4529274 https://doi.org/10.5281/zenodo.4529273 |
_version_ |
1766216874241032192 |