ORCA: a Benchmark for Data Web Crawlers

The number of RDF knowledge graphs available on the Web grows constantly. Gathering these graphs at large scale for downstream applications hence requires the use of crawlers. Although Data Web crawlers exist, and general Web crawlers could be adapted to focus on the Data Web, there is currently no...

Full description

Bibliographic Details
Main Authors: Röder, Michael, de Souza, Geraldo, Kuchelev, Denis, Desouki, Abdelmoneim Amer, Ngomo, Axel-Cyrille Ngonga
Format: Article in Journal/Newspaper
Language:unknown
Published: arXiv 2019
Subjects:
Online Access:https://dx.doi.org/10.48550/arxiv.1912.08026
https://arxiv.org/abs/1912.08026
id ftdatacite:10.48550/arxiv.1912.08026
record_format openpolar
spelling ftdatacite:10.48550/arxiv.1912.08026 2023-05-15T17:52:58+02:00 ORCA: a Benchmark for Data Web Crawlers Röder, Michael de Souza, Geraldo Kuchelev, Denis Desouki, Abdelmoneim Amer Ngomo, Axel-Cyrille Ngonga 2019 https://dx.doi.org/10.48550/arxiv.1912.08026 https://arxiv.org/abs/1912.08026 unknown arXiv arXiv.org perpetual, non-exclusive license http://arxiv.org/licenses/nonexclusive-distrib/1.0/ Databases cs.DB Performance cs.PF FOS Computer and information sciences Article CreativeWork article Preprint 2019 ftdatacite https://doi.org/10.48550/arxiv.1912.08026 2022-03-10T16:34:35Z The number of RDF knowledge graphs available on the Web grows constantly. Gathering these graphs at large scale for downstream applications hence requires the use of crawlers. Although Data Web crawlers exist, and general Web crawlers could be adapted to focus on the Data Web, there is currently no benchmark to fairly evaluate their performance. Our work closes this gap by presenting the Orca benchmark. Orca generates a synthetic Data Web, which is decoupled from the original Web and enables a fair and repeatable comparison of Data Web crawlers. Our evaluations show that Orca can be used to reveal the different advantages and disadvantages of existing crawlers. The benchmark is open-source and available at https://github.com/dice-group/orca. : 8 pages, submitted to a conference Article in Journal/Newspaper Orca DataCite Metadata Store (German National Library of Science and Technology)
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language unknown
topic Databases cs.DB
Performance cs.PF
FOS Computer and information sciences
spellingShingle Databases cs.DB
Performance cs.PF
FOS Computer and information sciences
Röder, Michael
de Souza, Geraldo
Kuchelev, Denis
Desouki, Abdelmoneim Amer
Ngomo, Axel-Cyrille Ngonga
ORCA: a Benchmark for Data Web Crawlers
topic_facet Databases cs.DB
Performance cs.PF
FOS Computer and information sciences
description The number of RDF knowledge graphs available on the Web grows constantly. Gathering these graphs at large scale for downstream applications hence requires the use of crawlers. Although Data Web crawlers exist, and general Web crawlers could be adapted to focus on the Data Web, there is currently no benchmark to fairly evaluate their performance. Our work closes this gap by presenting the Orca benchmark. Orca generates a synthetic Data Web, which is decoupled from the original Web and enables a fair and repeatable comparison of Data Web crawlers. Our evaluations show that Orca can be used to reveal the different advantages and disadvantages of existing crawlers. The benchmark is open-source and available at https://github.com/dice-group/orca. : 8 pages, submitted to a conference
format Article in Journal/Newspaper
author Röder, Michael
de Souza, Geraldo
Kuchelev, Denis
Desouki, Abdelmoneim Amer
Ngomo, Axel-Cyrille Ngonga
author_facet Röder, Michael
de Souza, Geraldo
Kuchelev, Denis
Desouki, Abdelmoneim Amer
Ngomo, Axel-Cyrille Ngonga
author_sort Röder, Michael
title ORCA: a Benchmark for Data Web Crawlers
title_short ORCA: a Benchmark for Data Web Crawlers
title_full ORCA: a Benchmark for Data Web Crawlers
title_fullStr ORCA: a Benchmark for Data Web Crawlers
title_full_unstemmed ORCA: a Benchmark for Data Web Crawlers
title_sort orca: a benchmark for data web crawlers
publisher arXiv
publishDate 2019
url https://dx.doi.org/10.48550/arxiv.1912.08026
https://arxiv.org/abs/1912.08026
genre Orca
genre_facet Orca
op_rights arXiv.org perpetual, non-exclusive license
http://arxiv.org/licenses/nonexclusive-distrib/1.0/
op_doi https://doi.org/10.48550/arxiv.1912.08026
_version_ 1766160726306586624