Text REtrieval Conference (TREC) Dynamic Domain polar dataset code, 2015-2016

Climate change is amplified in the Polar Regions. Polar amplification is captured via space and airborne remote sensing, in-situ measurement, and climate modeling. Beyond the rich literature that documents changing Polar regions, each method of Polar-data collection produces a diverse set of data ty...

Full description

Bibliographic Details
Main Author: Christian Mattmann
Format: Dataset
Language:unknown
Published: Arctic Data Center 2017
Subjects:
Online Access:https://doi.org/10.18739/A2FJ7C
id dataone:doi:10.18739/A2FJ7C
record_format openpolar
spelling dataone:doi:10.18739/A2FJ7C 2024-06-03T18:46:23+00:00 Text REtrieval Conference (TREC) Dynamic Domain polar dataset code, 2015-2016 Christian Mattmann Global ENVELOPE(-180.0,180.0,90.0,-90.0) BEGINDATE: 2015-01-01T00:00:00Z ENDDATE: 2016-01-01T00:00:00Z 2017-07-27T00:00:00Z https://doi.org/10.18739/A2FJ7C unknown Arctic Data Center Cryosphere Dataset 2017 dataone:urn:node:ARCTIC https://doi.org/10.18739/A2FJ7C 2024-06-03T18:10:07Z Climate change is amplified in the Polar Regions. Polar amplification is captured via space and airborne remote sensing, in-situ measurement, and climate modeling. Beyond the rich literature that documents changing Polar regions, each method of Polar-data collection produces a diverse set of data types, ranging from text-based metadata to more complex data structures (e.g. HDF, NetCDF, GRIB). Because finding these data is often a primary challenge in scientific discovery, inclusion of the Polar dataset in TREC-DD would help advance science through data discovery and provide TREC-DD a new challenge in in the realm of search relevancy. Dataset Description: This dataset is a collection of web crawls from three primary sources: Dr. Chris Mattmann's crawl of ADE, performed at the Open Science Codefest and at the [NSF DataViz Hackathon for Polar CyberInfrastructure] (http://nsf-polar-cyberinfrastructure.github.io/datavis-hackathon/) Dr. Mattmann's student Angela Wang, contributed 3 datasets: 2 crawls of ACADIS and one of NASA AMD. Dr. Mattmann's CSCI 572 Course at USC, students submitted 13 individual crawls of NASA ACADIS, NSIDC ADE, and AMD. Each web crawl used Apache Nutch as the core framework for web crawling and Apache Tika as the main content detection and extraction framework. Nutch is a distributed search engine that runs on top of Apache Hadoop. Apache Tika is an open source framework for metadata exploration, automatic text mining, and information retrieval. Web crawls were focused on three polar data repositories: the National Science Foundation Advanced Cooperative Arctic Data and Information System (ACADIS), the National Snow and Ice Data Center (NSIDC) Arctic Data Explorer (ADE), and the National Aeronautics and Space Administration Antarctic Master Directory (AMD). The finished Polar dataset is composed of 17 distinct web crawls, containing 1,741,530 records (158 GB) across the three Polar science data repositories, which themselves are largely uncoordinated. Dataset Antarc* Antarctic Arctic Climate change National Snow and Ice Data Center Arctic Data Center (via DataONE) Arctic Antarctic Tika ENVELOPE(7.590,7.590,63.223,63.223)
institution Open Polar
collection Arctic Data Center (via DataONE)
op_collection_id dataone:urn:node:ARCTIC
language unknown
topic Cryosphere
spellingShingle Cryosphere
Christian Mattmann
Text REtrieval Conference (TREC) Dynamic Domain polar dataset code, 2015-2016
topic_facet Cryosphere
description Climate change is amplified in the Polar Regions. Polar amplification is captured via space and airborne remote sensing, in-situ measurement, and climate modeling. Beyond the rich literature that documents changing Polar regions, each method of Polar-data collection produces a diverse set of data types, ranging from text-based metadata to more complex data structures (e.g. HDF, NetCDF, GRIB). Because finding these data is often a primary challenge in scientific discovery, inclusion of the Polar dataset in TREC-DD would help advance science through data discovery and provide TREC-DD a new challenge in in the realm of search relevancy. Dataset Description: This dataset is a collection of web crawls from three primary sources: Dr. Chris Mattmann's crawl of ADE, performed at the Open Science Codefest and at the [NSF DataViz Hackathon for Polar CyberInfrastructure] (http://nsf-polar-cyberinfrastructure.github.io/datavis-hackathon/) Dr. Mattmann's student Angela Wang, contributed 3 datasets: 2 crawls of ACADIS and one of NASA AMD. Dr. Mattmann's CSCI 572 Course at USC, students submitted 13 individual crawls of NASA ACADIS, NSIDC ADE, and AMD. Each web crawl used Apache Nutch as the core framework for web crawling and Apache Tika as the main content detection and extraction framework. Nutch is a distributed search engine that runs on top of Apache Hadoop. Apache Tika is an open source framework for metadata exploration, automatic text mining, and information retrieval. Web crawls were focused on three polar data repositories: the National Science Foundation Advanced Cooperative Arctic Data and Information System (ACADIS), the National Snow and Ice Data Center (NSIDC) Arctic Data Explorer (ADE), and the National Aeronautics and Space Administration Antarctic Master Directory (AMD). The finished Polar dataset is composed of 17 distinct web crawls, containing 1,741,530 records (158 GB) across the three Polar science data repositories, which themselves are largely uncoordinated.
format Dataset
author Christian Mattmann
author_facet Christian Mattmann
author_sort Christian Mattmann
title Text REtrieval Conference (TREC) Dynamic Domain polar dataset code, 2015-2016
title_short Text REtrieval Conference (TREC) Dynamic Domain polar dataset code, 2015-2016
title_full Text REtrieval Conference (TREC) Dynamic Domain polar dataset code, 2015-2016
title_fullStr Text REtrieval Conference (TREC) Dynamic Domain polar dataset code, 2015-2016
title_full_unstemmed Text REtrieval Conference (TREC) Dynamic Domain polar dataset code, 2015-2016
title_sort text retrieval conference (trec) dynamic domain polar dataset code, 2015-2016
publisher Arctic Data Center
publishDate 2017
url https://doi.org/10.18739/A2FJ7C
op_coverage Global
ENVELOPE(-180.0,180.0,90.0,-90.0)
BEGINDATE: 2015-01-01T00:00:00Z ENDDATE: 2016-01-01T00:00:00Z
long_lat ENVELOPE(7.590,7.590,63.223,63.223)
geographic Arctic
Antarctic
Tika
geographic_facet Arctic
Antarctic
Tika
genre Antarc*
Antarctic
Arctic
Climate change
National Snow and Ice Data Center
genre_facet Antarc*
Antarctic
Arctic
Climate change
National Snow and Ice Data Center
op_doi https://doi.org/10.18739/A2FJ7C
_version_ 1800872151407919104