An open source framework for metadata exploration and discovery of Polar Data

This project will deliver an open source framework for metadata exploration, automatic text mining and information retrieval of polar data that uses the Apache Tika technology. Apache Tika is currently the de facto "babel fish", aiding in the automatic MIME detection, text extraction, and...

Full description

Bibliographic Details
Main Author: Mattmann, Christian
Format: Dataset
Language:English
Published: NSF Arctic Data Center 2014
Subjects:
PCI
Online Access:https://dx.doi.org/10.18739/a2r49g96h
https://arcticdata.io/catalog/view/doi:10.18739/A2R49G96H
id ftdatacite:10.18739/a2r49g96h
record_format openpolar
spelling ftdatacite:10.18739/a2r49g96h 2023-05-15T14:04:20+02:00 An open source framework for metadata exploration and discovery of Polar Data Mattmann, Christian 2014 text/xml https://dx.doi.org/10.18739/a2r49g96h https://arcticdata.io/catalog/view/doi:10.18739/A2R49G96H en eng NSF Arctic Data Center PCI dataset Dataset 2014 ftdatacite https://doi.org/10.18739/a2r49g96h 2021-11-05T12:55:41Z This project will deliver an open source framework for metadata exploration, automatic text mining and information retrieval of polar data that uses the Apache Tika technology. Apache Tika is currently the de facto "babel fish", aiding in the automatic MIME detection, text extraction, and metadata classification of over 1200 data formats. The PI will expand Tika to handle polar data and scientific data formats, making Polar data more easily available, searchable, and retrievable by all major content management systems. The proposed activity will lay the framework for a thorough automatically generated inventory of polar metadata and data. Expanding Tika to handle polar data will also naturally invite the technology/open source community to deal with polar use cases, helping to increase understanding of the arctic. The resultant software produced through effort will be disseminated to the software and polar communities through the Apache Software Foundation. A computer science graduate student and postdoc will be exposed to Cryosphere and Arctic data, helping to train the next generation of cross disciplinary data scientists in the domain. The PI's Search Engines (20-40 students annual enrollment) and Software Architecture (30-50 students annual enrollment) graduate courses at USC will benefit from the Arctic cyberinfrastructure use cases disseminated through course projects and lecture material. The PI will also work collaboratively with NSF-funded projects dealing with projects focusing on the archiving, discovery and access of polar data, such as ACADIS and the Antarctic Master Directory. Dataset Antarc* Antarctic Arctic DataCite Metadata Store (German National Library of Science and Technology) Antarctic Arctic Babel ENVELOPE(-61.401,-61.401,-63.885,-63.885) The Antarctic Tika ENVELOPE(7.590,7.590,63.223,63.223)
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language English
topic PCI
spellingShingle PCI
Mattmann, Christian
An open source framework for metadata exploration and discovery of Polar Data
topic_facet PCI
description This project will deliver an open source framework for metadata exploration, automatic text mining and information retrieval of polar data that uses the Apache Tika technology. Apache Tika is currently the de facto "babel fish", aiding in the automatic MIME detection, text extraction, and metadata classification of over 1200 data formats. The PI will expand Tika to handle polar data and scientific data formats, making Polar data more easily available, searchable, and retrievable by all major content management systems. The proposed activity will lay the framework for a thorough automatically generated inventory of polar metadata and data. Expanding Tika to handle polar data will also naturally invite the technology/open source community to deal with polar use cases, helping to increase understanding of the arctic. The resultant software produced through effort will be disseminated to the software and polar communities through the Apache Software Foundation. A computer science graduate student and postdoc will be exposed to Cryosphere and Arctic data, helping to train the next generation of cross disciplinary data scientists in the domain. The PI's Search Engines (20-40 students annual enrollment) and Software Architecture (30-50 students annual enrollment) graduate courses at USC will benefit from the Arctic cyberinfrastructure use cases disseminated through course projects and lecture material. The PI will also work collaboratively with NSF-funded projects dealing with projects focusing on the archiving, discovery and access of polar data, such as ACADIS and the Antarctic Master Directory.
format Dataset
author Mattmann, Christian
author_facet Mattmann, Christian
author_sort Mattmann, Christian
title An open source framework for metadata exploration and discovery of Polar Data
title_short An open source framework for metadata exploration and discovery of Polar Data
title_full An open source framework for metadata exploration and discovery of Polar Data
title_fullStr An open source framework for metadata exploration and discovery of Polar Data
title_full_unstemmed An open source framework for metadata exploration and discovery of Polar Data
title_sort open source framework for metadata exploration and discovery of polar data
publisher NSF Arctic Data Center
publishDate 2014
url https://dx.doi.org/10.18739/a2r49g96h
https://arcticdata.io/catalog/view/doi:10.18739/A2R49G96H
long_lat ENVELOPE(-61.401,-61.401,-63.885,-63.885)
ENVELOPE(7.590,7.590,63.223,63.223)
geographic Antarctic
Arctic
Babel
The Antarctic
Tika
geographic_facet Antarctic
Arctic
Babel
The Antarctic
Tika
genre Antarc*
Antarctic
Arctic
genre_facet Antarc*
Antarctic
Arctic
op_doi https://doi.org/10.18739/a2r49g96h
_version_ 1766275383236231168