Text REtrieval Conference (TREC) Dynamic Domain polar dataset code, 2015-2016

Climate change is amplified in the Polar Regions. Polar amplification is captured via space and airborne remote sensing, in-situ measurement, and climate modeling. Beyond the rich literature that documents changing Polar regions, each method of Polar-data collection produces a diverse set of data ty...

Full description

Bibliographic Details
Main Author: Christian Mattmann
Format: Dataset
Language:unknown
Published: Arctic Data Center 2017
Subjects:
Online Access:https://doi.org/10.18739/A2J678X27
Description
Summary:Climate change is amplified in the Polar Regions. Polar amplification is captured via space and airborne remote sensing, in-situ measurement, and climate modeling. Beyond the rich literature that documents changing Polar regions, each method of Polar-data collection produces a diverse set of data types, ranging from text-based metadata to more complex data structures (e.g. HDF, NetCDF, GRIB). Because finding these data is often a primary challenge in scientific discovery, inclusion of the Polar dataset in TREC-DD would help advance science through data discovery and provide TREC-DD a new challenge in in the realm of search relevancy. Dataset Description: This dataset is a collection of web crawls from three primary sources: Dr. Chris Mattmann's crawl of ADE, performed at the Open Science Codefest and at the [NSF DataViz Hackathon for Polar CyberInfrastructure] (http://nsf-polar-cyberinfrastructure.github.io/datavis-hackathon/) Dr. Mattmann's student Angela Wang, contributed 3 datasets: 2 crawls of ACADIS and one of NASA AMD. Dr. Mattmann's CSCI 572 Course at USC, students submitted 13 individual crawls of NASA ACADIS, NSIDC ADE, and AMD. Each web crawl used Apache Nutch as the core framework for web crawling and Apache Tika as the main content detection and extraction framework. Nutch is a distributed search engine that runs on top of Apache Hadoop. Apache Tika is an open source framework for metadata exploration, automatic text mining, and information retrieval. Web crawls were focused on three polar data repositories: the National Science Foundation Advanced Cooperative Arctic Data and Information System (ACADIS), the National Snow and Ice Data Center (NSIDC) Arctic Data Explorer (ADE), and the National Aeronautics and Space Administration Antarctic Master Directory (AMD). The finished Polar dataset is composed of 17 distinct web crawls, containing 1,741,530 records (158 GB) across the three Polar science data repositories, which themselves are largely uncoordinated.