Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation

Concept extraction is crucial for a number of downstream applications. However, surprisingly enough, straightforward single token/nominal chunk–concept alignment or dictionary lookup techniques such as DBpedia Spotlight still prevail. We propose a generic open domain-oriented extractive model that i...

Full description

Bibliographic Details
Main Authors: Alexander Shvets, Leo Wanner
Format: Conference Object
Language:English
Published: Zenodo 2021
Subjects:
Online Access:https://doi.org/10.5281/zenodo.4529274
id ftzenodo:oai:zenodo.org:4529274
record_format openpolar
spelling ftzenodo:oai:zenodo.org:4529274 2024-09-15T18:39:02+00:00 Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation Alexander Shvets Leo Wanner 2021-02-10 https://doi.org/10.5281/zenodo.4529274 eng eng Zenodo https://zenodo.org/communities/connexions-h2020 https://doi.org/10.5281/zenodo.4529273 https://doi.org/10.5281/zenodo.4529274 oai:zenodo.org:4529274 info:eu-repo/semantics/openAccess Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode EKAW, International Conference on Knowledge Engineering and Knowledge Management, Bolzano, Italy, 2020 Concept extraction Open-domain discourse texts Pointer-generator neural network Distant supervision info:eu-repo/semantics/conferencePaper 2021 ftzenodo https://doi.org/10.5281/zenodo.452927410.5281/zenodo.4529273 2024-07-25T20:41:58Z Concept extraction is crucial for a number of downstream applications. However, surprisingly enough, straightforward single token/nominal chunk–concept alignment or dictionary lookup techniques such as DBpedia Spotlight still prevail. We propose a generic open domain-oriented extractive model that is based on distant supervision of a pointer–generator network leveraging bidirectional LSTMs and a copy mechanism and that is able to cope with the out-of-vocabulary phenomenon. The model has been trained on a large annotated corpus compiled specifically for this task from 250K Wikipedia pages, and tested on regular pages, where the pointers to other pages are considered as ground truth concepts. The outcome of the experiments shows that our model significantly outperforms standard techniques and, when used on top of DBpedia Spotlight, further improves its performance. The experiments furthermore show that the model can be readily ported to other datasets on which it equally achieves a state-of-the-art performance. Conference Object The Pointers Zenodo
institution Open Polar
collection Zenodo
op_collection_id ftzenodo
language English
topic Concept extraction
Open-domain discourse texts
Pointer-generator neural network
Distant supervision
spellingShingle Concept extraction
Open-domain discourse texts
Pointer-generator neural network
Distant supervision
Alexander Shvets
Leo Wanner
Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation
topic_facet Concept extraction
Open-domain discourse texts
Pointer-generator neural network
Distant supervision
description Concept extraction is crucial for a number of downstream applications. However, surprisingly enough, straightforward single token/nominal chunk–concept alignment or dictionary lookup techniques such as DBpedia Spotlight still prevail. We propose a generic open domain-oriented extractive model that is based on distant supervision of a pointer–generator network leveraging bidirectional LSTMs and a copy mechanism and that is able to cope with the out-of-vocabulary phenomenon. The model has been trained on a large annotated corpus compiled specifically for this task from 250K Wikipedia pages, and tested on regular pages, where the pointers to other pages are considered as ground truth concepts. The outcome of the experiments shows that our model significantly outperforms standard techniques and, when used on top of DBpedia Spotlight, further improves its performance. The experiments furthermore show that the model can be readily ported to other datasets on which it equally achieves a state-of-the-art performance.
format Conference Object
author Alexander Shvets
Leo Wanner
author_facet Alexander Shvets
Leo Wanner
author_sort Alexander Shvets
title Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation
title_short Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation
title_full Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation
title_fullStr Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation
title_full_unstemmed Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation
title_sort concept extraction using pointer–generator networks and distant supervision for data augmentation
publisher Zenodo
publishDate 2021
url https://doi.org/10.5281/zenodo.4529274
genre The Pointers
genre_facet The Pointers
op_source EKAW, International Conference on Knowledge Engineering and Knowledge Management, Bolzano, Italy, 2020
op_relation https://zenodo.org/communities/connexions-h2020
https://doi.org/10.5281/zenodo.4529273
https://doi.org/10.5281/zenodo.4529274
oai:zenodo.org:4529274
op_rights info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
op_doi https://doi.org/10.5281/zenodo.452927410.5281/zenodo.4529273
_version_ 1810483427574546432