Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation
Concept extraction is crucial for a number of downstream applications. However, surprisingly enough, straightforward single token/nominal chunk–concept alignment or dictionary lookup techniques such as DBpedia Spotlight still prevail. We propose a generic open domain-oriented extractive model that i...
Main Authors: | , |
---|---|
Format: | Conference Object |
Language: | English |
Published: |
Zenodo
2021
|
Subjects: | |
Online Access: | https://doi.org/10.5281/zenodo.4529274 |
id |
ftzenodo:oai:zenodo.org:4529274 |
---|---|
record_format |
openpolar |
spelling |
ftzenodo:oai:zenodo.org:4529274 2024-09-15T18:39:02+00:00 Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation Alexander Shvets Leo Wanner 2021-02-10 https://doi.org/10.5281/zenodo.4529274 eng eng Zenodo https://zenodo.org/communities/connexions-h2020 https://doi.org/10.5281/zenodo.4529273 https://doi.org/10.5281/zenodo.4529274 oai:zenodo.org:4529274 info:eu-repo/semantics/openAccess Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode EKAW, International Conference on Knowledge Engineering and Knowledge Management, Bolzano, Italy, 2020 Concept extraction Open-domain discourse texts Pointer-generator neural network Distant supervision info:eu-repo/semantics/conferencePaper 2021 ftzenodo https://doi.org/10.5281/zenodo.452927410.5281/zenodo.4529273 2024-07-25T20:41:58Z Concept extraction is crucial for a number of downstream applications. However, surprisingly enough, straightforward single token/nominal chunk–concept alignment or dictionary lookup techniques such as DBpedia Spotlight still prevail. We propose a generic open domain-oriented extractive model that is based on distant supervision of a pointer–generator network leveraging bidirectional LSTMs and a copy mechanism and that is able to cope with the out-of-vocabulary phenomenon. The model has been trained on a large annotated corpus compiled specifically for this task from 250K Wikipedia pages, and tested on regular pages, where the pointers to other pages are considered as ground truth concepts. The outcome of the experiments shows that our model significantly outperforms standard techniques and, when used on top of DBpedia Spotlight, further improves its performance. The experiments furthermore show that the model can be readily ported to other datasets on which it equally achieves a state-of-the-art performance. Conference Object The Pointers Zenodo |
institution |
Open Polar |
collection |
Zenodo |
op_collection_id |
ftzenodo |
language |
English |
topic |
Concept extraction Open-domain discourse texts Pointer-generator neural network Distant supervision |
spellingShingle |
Concept extraction Open-domain discourse texts Pointer-generator neural network Distant supervision Alexander Shvets Leo Wanner Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation |
topic_facet |
Concept extraction Open-domain discourse texts Pointer-generator neural network Distant supervision |
description |
Concept extraction is crucial for a number of downstream applications. However, surprisingly enough, straightforward single token/nominal chunk–concept alignment or dictionary lookup techniques such as DBpedia Spotlight still prevail. We propose a generic open domain-oriented extractive model that is based on distant supervision of a pointer–generator network leveraging bidirectional LSTMs and a copy mechanism and that is able to cope with the out-of-vocabulary phenomenon. The model has been trained on a large annotated corpus compiled specifically for this task from 250K Wikipedia pages, and tested on regular pages, where the pointers to other pages are considered as ground truth concepts. The outcome of the experiments shows that our model significantly outperforms standard techniques and, when used on top of DBpedia Spotlight, further improves its performance. The experiments furthermore show that the model can be readily ported to other datasets on which it equally achieves a state-of-the-art performance. |
format |
Conference Object |
author |
Alexander Shvets Leo Wanner |
author_facet |
Alexander Shvets Leo Wanner |
author_sort |
Alexander Shvets |
title |
Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation |
title_short |
Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation |
title_full |
Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation |
title_fullStr |
Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation |
title_full_unstemmed |
Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation |
title_sort |
concept extraction using pointer–generator networks and distant supervision for data augmentation |
publisher |
Zenodo |
publishDate |
2021 |
url |
https://doi.org/10.5281/zenodo.4529274 |
genre |
The Pointers |
genre_facet |
The Pointers |
op_source |
EKAW, International Conference on Knowledge Engineering and Knowledge Management, Bolzano, Italy, 2020 |
op_relation |
https://zenodo.org/communities/connexions-h2020 https://doi.org/10.5281/zenodo.4529273 https://doi.org/10.5281/zenodo.4529274 oai:zenodo.org:4529274 |
op_rights |
info:eu-repo/semantics/openAccess Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode |
op_doi |
https://doi.org/10.5281/zenodo.452927410.5281/zenodo.4529273 |
_version_ |
1810483427574546432 |