Text REtrieval Conference (TREC) Dynamic Domain polar dataset code, 2015-2016
Climate change is amplified in the Polar Regions. Polar amplification is captured via space and airborne remote sensing, in-situ measurement, and climate modeling. Beyond the rich literature that documents changing Polar regions, each method of Polar-data collection produces a diverse set of data ty...
Main Author: | |
---|---|
Format: | Dataset |
Language: | unknown |
Published: |
Arctic Data Center
2017
|
Subjects: | |
Online Access: | https://doi.org/10.18739/A2J678X27 |
id |
dataone:doi:10.18739/A2J678X27 |
---|---|
record_format |
openpolar |
spelling |
dataone:doi:10.18739/A2J678X27 2024-11-03T19:44:46+00:00 Text REtrieval Conference (TREC) Dynamic Domain polar dataset code, 2015-2016 Christian Mattmann Global ENVELOPE(-180.0,180.0,90.0,-90.0) BEGINDATE: 2015-01-01T00:00:00Z ENDDATE: 2016-01-01T00:00:00Z 2017-07-27T00:00:00Z https://doi.org/10.18739/A2J678X27 unknown Arctic Data Center Cryosphere Dataset 2017 dataone:urn:node:ARCTIC https://doi.org/10.18739/A2J678X27 2024-11-03T19:15:44Z Climate change is amplified in the Polar Regions. Polar amplification is captured via space and airborne remote sensing, in-situ measurement, and climate modeling. Beyond the rich literature that documents changing Polar regions, each method of Polar-data collection produces a diverse set of data types, ranging from text-based metadata to more complex data structures (e.g. HDF, NetCDF, GRIB). Because finding these data is often a primary challenge in scientific discovery, inclusion of the Polar dataset in TREC-DD would help advance science through data discovery and provide TREC-DD a new challenge in in the realm of search relevancy. Dataset Description: This dataset is a collection of web crawls from three primary sources: Dr. Chris Mattmann's crawl of ADE, performed at the Open Science Codefest and at the [NSF DataViz Hackathon for Polar CyberInfrastructure] (http://nsf-polar-cyberinfrastructure.github.io/datavis-hackathon/) Dr. Mattmann's student Angela Wang, contributed 3 datasets: 2 crawls of ACADIS and one of NASA AMD. Dr. Mattmann's CSCI 572 Course at USC, students submitted 13 individual crawls of NASA ACADIS, NSIDC ADE, and AMD. Each web crawl used Apache Nutch as the core framework for web crawling and Apache Tika as the main content detection and extraction framework. Nutch is a distributed search engine that runs on top of Apache Hadoop. Apache Tika is an open source framework for metadata exploration, automatic text mining, and information retrieval. Web crawls were focused on three polar data repositories: the National Science Foundation Advanced Cooperative Arctic Data and Information System (ACADIS), the National Snow and Ice Data Center (NSIDC) Arctic Data Explorer (ADE), and the National Aeronautics and Space Administration Antarctic Master Directory (AMD). The finished Polar dataset is composed of 17 distinct web crawls, containing 1,741,530 records (158 GB) across the three Polar science data repositories, which themselves are largely uncoordinated. Dataset Antarc* Antarctic Arctic Climate change National Snow and Ice Data Center Arctic Data Center (via DataONE) Arctic Antarctic Tika ENVELOPE(7.590,7.590,63.223,63.223) |
institution |
Open Polar |
collection |
Arctic Data Center (via DataONE) |
op_collection_id |
dataone:urn:node:ARCTIC |
language |
unknown |
topic |
Cryosphere |
spellingShingle |
Cryosphere Christian Mattmann Text REtrieval Conference (TREC) Dynamic Domain polar dataset code, 2015-2016 |
topic_facet |
Cryosphere |
description |
Climate change is amplified in the Polar Regions. Polar amplification is captured via space and airborne remote sensing, in-situ measurement, and climate modeling. Beyond the rich literature that documents changing Polar regions, each method of Polar-data collection produces a diverse set of data types, ranging from text-based metadata to more complex data structures (e.g. HDF, NetCDF, GRIB). Because finding these data is often a primary challenge in scientific discovery, inclusion of the Polar dataset in TREC-DD would help advance science through data discovery and provide TREC-DD a new challenge in in the realm of search relevancy. Dataset Description: This dataset is a collection of web crawls from three primary sources: Dr. Chris Mattmann's crawl of ADE, performed at the Open Science Codefest and at the [NSF DataViz Hackathon for Polar CyberInfrastructure] (http://nsf-polar-cyberinfrastructure.github.io/datavis-hackathon/) Dr. Mattmann's student Angela Wang, contributed 3 datasets: 2 crawls of ACADIS and one of NASA AMD. Dr. Mattmann's CSCI 572 Course at USC, students submitted 13 individual crawls of NASA ACADIS, NSIDC ADE, and AMD. Each web crawl used Apache Nutch as the core framework for web crawling and Apache Tika as the main content detection and extraction framework. Nutch is a distributed search engine that runs on top of Apache Hadoop. Apache Tika is an open source framework for metadata exploration, automatic text mining, and information retrieval. Web crawls were focused on three polar data repositories: the National Science Foundation Advanced Cooperative Arctic Data and Information System (ACADIS), the National Snow and Ice Data Center (NSIDC) Arctic Data Explorer (ADE), and the National Aeronautics and Space Administration Antarctic Master Directory (AMD). The finished Polar dataset is composed of 17 distinct web crawls, containing 1,741,530 records (158 GB) across the three Polar science data repositories, which themselves are largely uncoordinated. |
format |
Dataset |
author |
Christian Mattmann |
author_facet |
Christian Mattmann |
author_sort |
Christian Mattmann |
title |
Text REtrieval Conference (TREC) Dynamic Domain polar dataset code, 2015-2016 |
title_short |
Text REtrieval Conference (TREC) Dynamic Domain polar dataset code, 2015-2016 |
title_full |
Text REtrieval Conference (TREC) Dynamic Domain polar dataset code, 2015-2016 |
title_fullStr |
Text REtrieval Conference (TREC) Dynamic Domain polar dataset code, 2015-2016 |
title_full_unstemmed |
Text REtrieval Conference (TREC) Dynamic Domain polar dataset code, 2015-2016 |
title_sort |
text retrieval conference (trec) dynamic domain polar dataset code, 2015-2016 |
publisher |
Arctic Data Center |
publishDate |
2017 |
url |
https://doi.org/10.18739/A2J678X27 |
op_coverage |
Global ENVELOPE(-180.0,180.0,90.0,-90.0) BEGINDATE: 2015-01-01T00:00:00Z ENDDATE: 2016-01-01T00:00:00Z |
long_lat |
ENVELOPE(7.590,7.590,63.223,63.223) |
geographic |
Arctic Antarctic Tika |
geographic_facet |
Arctic Antarctic Tika |
genre |
Antarc* Antarctic Arctic Climate change National Snow and Ice Data Center |
genre_facet |
Antarc* Antarctic Arctic Climate change National Snow and Ice Data Center |
op_doi |
https://doi.org/10.18739/A2J678X27 |
_version_ |
1814736443125268480 |