ORCA: a Benchmark for Data Web Crawlers
The number of RDF knowledge graphs available on the Web grows constantly. Gathering these graphs at large scale for downstream applications hence requires the use of crawlers. Although Data Web crawlers exist, and general Web crawlers could be adapted to focus on the Data Web, there is currently no...
Main Authors: | , , , , |
---|---|
Format: | Article in Journal/Newspaper |
Language: | unknown |
Published: |
arXiv
2019
|
Subjects: | |
Online Access: | https://dx.doi.org/10.48550/arxiv.1912.08026 https://arxiv.org/abs/1912.08026 |
id |
ftdatacite:10.48550/arxiv.1912.08026 |
---|---|
record_format |
openpolar |
spelling |
ftdatacite:10.48550/arxiv.1912.08026 2023-05-15T17:52:58+02:00 ORCA: a Benchmark for Data Web Crawlers Röder, Michael de Souza, Geraldo Kuchelev, Denis Desouki, Abdelmoneim Amer Ngomo, Axel-Cyrille Ngonga 2019 https://dx.doi.org/10.48550/arxiv.1912.08026 https://arxiv.org/abs/1912.08026 unknown arXiv arXiv.org perpetual, non-exclusive license http://arxiv.org/licenses/nonexclusive-distrib/1.0/ Databases cs.DB Performance cs.PF FOS Computer and information sciences Article CreativeWork article Preprint 2019 ftdatacite https://doi.org/10.48550/arxiv.1912.08026 2022-03-10T16:34:35Z The number of RDF knowledge graphs available on the Web grows constantly. Gathering these graphs at large scale for downstream applications hence requires the use of crawlers. Although Data Web crawlers exist, and general Web crawlers could be adapted to focus on the Data Web, there is currently no benchmark to fairly evaluate their performance. Our work closes this gap by presenting the Orca benchmark. Orca generates a synthetic Data Web, which is decoupled from the original Web and enables a fair and repeatable comparison of Data Web crawlers. Our evaluations show that Orca can be used to reveal the different advantages and disadvantages of existing crawlers. The benchmark is open-source and available at https://github.com/dice-group/orca. : 8 pages, submitted to a conference Article in Journal/Newspaper Orca DataCite Metadata Store (German National Library of Science and Technology) |
institution |
Open Polar |
collection |
DataCite Metadata Store (German National Library of Science and Technology) |
op_collection_id |
ftdatacite |
language |
unknown |
topic |
Databases cs.DB Performance cs.PF FOS Computer and information sciences |
spellingShingle |
Databases cs.DB Performance cs.PF FOS Computer and information sciences Röder, Michael de Souza, Geraldo Kuchelev, Denis Desouki, Abdelmoneim Amer Ngomo, Axel-Cyrille Ngonga ORCA: a Benchmark for Data Web Crawlers |
topic_facet |
Databases cs.DB Performance cs.PF FOS Computer and information sciences |
description |
The number of RDF knowledge graphs available on the Web grows constantly. Gathering these graphs at large scale for downstream applications hence requires the use of crawlers. Although Data Web crawlers exist, and general Web crawlers could be adapted to focus on the Data Web, there is currently no benchmark to fairly evaluate their performance. Our work closes this gap by presenting the Orca benchmark. Orca generates a synthetic Data Web, which is decoupled from the original Web and enables a fair and repeatable comparison of Data Web crawlers. Our evaluations show that Orca can be used to reveal the different advantages and disadvantages of existing crawlers. The benchmark is open-source and available at https://github.com/dice-group/orca. : 8 pages, submitted to a conference |
format |
Article in Journal/Newspaper |
author |
Röder, Michael de Souza, Geraldo Kuchelev, Denis Desouki, Abdelmoneim Amer Ngomo, Axel-Cyrille Ngonga |
author_facet |
Röder, Michael de Souza, Geraldo Kuchelev, Denis Desouki, Abdelmoneim Amer Ngomo, Axel-Cyrille Ngonga |
author_sort |
Röder, Michael |
title |
ORCA: a Benchmark for Data Web Crawlers |
title_short |
ORCA: a Benchmark for Data Web Crawlers |
title_full |
ORCA: a Benchmark for Data Web Crawlers |
title_fullStr |
ORCA: a Benchmark for Data Web Crawlers |
title_full_unstemmed |
ORCA: a Benchmark for Data Web Crawlers |
title_sort |
orca: a benchmark for data web crawlers |
publisher |
arXiv |
publishDate |
2019 |
url |
https://dx.doi.org/10.48550/arxiv.1912.08026 https://arxiv.org/abs/1912.08026 |
genre |
Orca |
genre_facet |
Orca |
op_rights |
arXiv.org perpetual, non-exclusive license http://arxiv.org/licenses/nonexclusive-distrib/1.0/ |
op_doi |
https://doi.org/10.48550/arxiv.1912.08026 |
_version_ |
1766160726306586624 |