A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue

Access to retrospective data is essential for understanding environmental variability and change, it is important for initializing and validating models of all kinds, and for illuminating the relationships between ecosystems and societies that depend on them. A major barrier to effective use of hist...

Full description

Bibliographic Details
Main Author: Mahoney, Andy
Format: Dataset
Language:English
Published: Axiom Data Science 2020
Subjects:
Online Access:https://dx.doi.org/10.24431/rw1k479
https://search.dataone.org/#view/10.24431/rw1k479
id ftdatacite:10.24431/rw1k479
record_format openpolar
spelling ftdatacite:10.24431/rw1k479 2023-05-15T15:05:05+02:00 A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue Mahoney, Andy 2020 https://dx.doi.org/10.24431/rw1k479 https://search.dataone.org/#view/10.24431/rw1k479 en eng Axiom Data Science dataset Dataset 2020 ftdatacite https://doi.org/10.24431/rw1k479 2021-11-05T12:55:41Z Access to retrospective data is essential for understanding environmental variability and change, it is important for initializing and validating models of all kinds, and for illuminating the relationships between ecosystems and societies that depend on them. A major barrier to effective use of historical data in any discipline is the need to transform large quantities of manuscript or printed text, especially complex data tables, into formats that can be collated and analyzed by computers. However, there is currently no Optical Character Recognition (OCR) engine that can render scanned images of documents into digital text with a level of accuracy that renders human intervention unnecessary. This is especially true with respect to scientific data presented in tables or other matrix formats. The goal of this project was to build an open source citizen science mediated OCR module to facilitate transcription of complex data tables and other typescript or printed material (e.g Arctic and worldwide weather observations recorded in ship's logs), and integrate it into the Zooniverse transcription software bundle. This module will be available to the public via Zooniverse: https://www.zooniverse.org/projects/zooniverse/oldweather-ocr. Dataset Arctic DataCite Metadata Store (German National Library of Science and Technology) Arctic
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language English
description Access to retrospective data is essential for understanding environmental variability and change, it is important for initializing and validating models of all kinds, and for illuminating the relationships between ecosystems and societies that depend on them. A major barrier to effective use of historical data in any discipline is the need to transform large quantities of manuscript or printed text, especially complex data tables, into formats that can be collated and analyzed by computers. However, there is currently no Optical Character Recognition (OCR) engine that can render scanned images of documents into digital text with a level of accuracy that renders human intervention unnecessary. This is especially true with respect to scientific data presented in tables or other matrix formats. The goal of this project was to build an open source citizen science mediated OCR module to facilitate transcription of complex data tables and other typescript or printed material (e.g Arctic and worldwide weather observations recorded in ship's logs), and integrate it into the Zooniverse transcription software bundle. This module will be available to the public via Zooniverse: https://www.zooniverse.org/projects/zooniverse/oldweather-ocr.
format Dataset
author Mahoney, Andy
spellingShingle Mahoney, Andy
A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue
author_facet Mahoney, Andy
author_sort Mahoney, Andy
title A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue
title_short A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue
title_full A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue
title_fullStr A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue
title_full_unstemmed A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue
title_sort citizen science mediated optical character recognition (ocr) module for large-scale data rescue
publisher Axiom Data Science
publishDate 2020
url https://dx.doi.org/10.24431/rw1k479
https://search.dataone.org/#view/10.24431/rw1k479
geographic Arctic
geographic_facet Arctic
genre Arctic
genre_facet Arctic
op_doi https://doi.org/10.24431/rw1k479
_version_ 1766336835330506752