A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue

Access to retrospective data is essential for understanding environmental variability and change, it is important for initializing and validating models of all kinds, and for illuminating the relationships between ecosystems and societies that depend on them. A major barrier to effective use of hist...

Full description

Bibliographic Details
Main Author: Andy Mahoney
Format: Dataset
Language:unknown
Published: Research Workspace 2015
Subjects:
Online Access:https://search.dataone.org/view/10.24431_rw1k479_20201116T200748Z
id dataone:10.24431_rw1k479_20201116T200748Z
record_format openpolar
spelling dataone:10.24431_rw1k479_20201116T200748Z 2024-10-03T18:45:55+00:00 A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue Andy Mahoney 2015-06-01T00:00:00Z https://search.dataone.org/view/10.24431_rw1k479_20201116T200748Z unknown Research Workspace North Pacific Research Board Dataset 2015 dataone:urn:node:RW 2024-10-03T18:16:41Z Access to retrospective data is essential for understanding environmental variability and change, it is important for initializing and validating models of all kinds, and for illuminating the relationships between ecosystems and societies that depend on them. A major barrier to effective use of historical data in any discipline is the need to transform large quantities of manuscript or printed text, especially complex data tables, into formats that can be collated and analyzed by computers. However, there is currently no Optical Character Recognition (OCR) engine that can render scanned images of documents into digital text with a level of accuracy that renders human intervention unnecessary. This is especially true with respect to scientific data presented in tables or other matrix formats. The goal of this project was to build an open source citizen science mediated OCR module to facilitate transcription of complex data tables and other typescript or printed material (e.g Arctic and worldwide weather observations recorded in ship's logs), and integrate it into the Zooniverse transcription software bundle. This module will be available to the public via Zooniverse: https://www.zooniverse.org/projects/zooniverse/oldweather-ocr. Dataset Arctic Research Workspace (via DataONE) Arctic Pacific
institution Open Polar
collection Research Workspace (via DataONE)
op_collection_id dataone:urn:node:RW
language unknown
topic North Pacific Research Board
spellingShingle North Pacific Research Board
Andy Mahoney
A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue
topic_facet North Pacific Research Board
description Access to retrospective data is essential for understanding environmental variability and change, it is important for initializing and validating models of all kinds, and for illuminating the relationships between ecosystems and societies that depend on them. A major barrier to effective use of historical data in any discipline is the need to transform large quantities of manuscript or printed text, especially complex data tables, into formats that can be collated and analyzed by computers. However, there is currently no Optical Character Recognition (OCR) engine that can render scanned images of documents into digital text with a level of accuracy that renders human intervention unnecessary. This is especially true with respect to scientific data presented in tables or other matrix formats. The goal of this project was to build an open source citizen science mediated OCR module to facilitate transcription of complex data tables and other typescript or printed material (e.g Arctic and worldwide weather observations recorded in ship's logs), and integrate it into the Zooniverse transcription software bundle. This module will be available to the public via Zooniverse: https://www.zooniverse.org/projects/zooniverse/oldweather-ocr.
format Dataset
author Andy Mahoney
author_facet Andy Mahoney
author_sort Andy Mahoney
title A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue
title_short A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue
title_full A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue
title_fullStr A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue
title_full_unstemmed A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue
title_sort citizen science mediated optical character recognition (ocr) module for large-scale data rescue
publisher Research Workspace
publishDate 2015
url https://search.dataone.org/view/10.24431_rw1k479_20201116T200748Z
geographic Arctic
Pacific
geographic_facet Arctic
Pacific
genre Arctic
genre_facet Arctic
_version_ 1811922158791688192