Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation

Concept extraction is crucial for a number of downstream applications. However, surprisingly enough, straightforward single token/nominal chunk–concept alignment or dictionary lookup techniques such as DBpedia Spotlight still prevail. We propose a generic open domain-oriented extractive model that i...

Full description

Bibliographic Details
Main Authors: Shvets, Alexander, Wanner, Leo
Format: Conference Object
Language:English
Published: Zenodo 2021
Subjects:
Online Access:https://dx.doi.org/10.5281/zenodo.4529273
https://zenodo.org/record/4529273
id ftdatacite:10.5281/zenodo.4529273
record_format openpolar
spelling ftdatacite:10.5281/zenodo.4529273 2023-05-15T18:32:41+02:00 Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation Shvets, Alexander Wanner, Leo 2021 https://dx.doi.org/10.5281/zenodo.4529273 https://zenodo.org/record/4529273 en eng Zenodo https://zenodo.org/communities/connexions-h2020 https://dx.doi.org/10.5281/zenodo.4529274 https://zenodo.org/communities/connexions-h2020 Open Access Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 info:eu-repo/semantics/openAccess CC-BY Concept extraction Open-domain discourse texts Pointer-generator neural network Distant supervision Text Conference paper article-journal ScholarlyArticle 2021 ftdatacite https://doi.org/10.5281/zenodo.4529273 https://doi.org/10.5281/zenodo.4529274 2021-11-05T12:55:41Z Concept extraction is crucial for a number of downstream applications. However, surprisingly enough, straightforward single token/nominal chunk–concept alignment or dictionary lookup techniques such as DBpedia Spotlight still prevail. We propose a generic open domain-oriented extractive model that is based on distant supervision of a pointer–generator network leveraging bidirectional LSTMs and a copy mechanism and that is able to cope with the out-of-vocabulary phenomenon. The model has been trained on a large annotated corpus compiled specifically for this task from 250K Wikipedia pages, and tested on regular pages, where the pointers to other pages are considered as ground truth concepts. The outcome of the experiments shows that our model significantly outperforms standard techniques and, when used on top of DBpedia Spotlight, further improves its performance. The experiments furthermore show that the model can be readily ported to other datasets on which it equally achieves a state-of-the-art performance. Conference Object The Pointers DataCite Metadata Store (German National Library of Science and Technology)
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language English
topic Concept extraction
Open-domain discourse texts
Pointer-generator neural network
Distant supervision
spellingShingle Concept extraction
Open-domain discourse texts
Pointer-generator neural network
Distant supervision
Shvets, Alexander
Wanner, Leo
Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation
topic_facet Concept extraction
Open-domain discourse texts
Pointer-generator neural network
Distant supervision
description Concept extraction is crucial for a number of downstream applications. However, surprisingly enough, straightforward single token/nominal chunk–concept alignment or dictionary lookup techniques such as DBpedia Spotlight still prevail. We propose a generic open domain-oriented extractive model that is based on distant supervision of a pointer–generator network leveraging bidirectional LSTMs and a copy mechanism and that is able to cope with the out-of-vocabulary phenomenon. The model has been trained on a large annotated corpus compiled specifically for this task from 250K Wikipedia pages, and tested on regular pages, where the pointers to other pages are considered as ground truth concepts. The outcome of the experiments shows that our model significantly outperforms standard techniques and, when used on top of DBpedia Spotlight, further improves its performance. The experiments furthermore show that the model can be readily ported to other datasets on which it equally achieves a state-of-the-art performance.
format Conference Object
author Shvets, Alexander
Wanner, Leo
author_facet Shvets, Alexander
Wanner, Leo
author_sort Shvets, Alexander
title Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation
title_short Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation
title_full Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation
title_fullStr Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation
title_full_unstemmed Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation
title_sort concept extraction using pointer–generator networks and distant supervision for data augmentation
publisher Zenodo
publishDate 2021
url https://dx.doi.org/10.5281/zenodo.4529273
https://zenodo.org/record/4529273
genre The Pointers
genre_facet The Pointers
op_relation https://zenodo.org/communities/connexions-h2020
https://dx.doi.org/10.5281/zenodo.4529274
https://zenodo.org/communities/connexions-h2020
op_rights Open Access
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
cc-by-4.0
info:eu-repo/semantics/openAccess
op_rightsnorm CC-BY
op_doi https://doi.org/10.5281/zenodo.4529273
https://doi.org/10.5281/zenodo.4529274
_version_ 1766216874033414144