A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue
Access to retrospective data is essential for understanding environmental variability and change, it is important for initializing and validating models of all kinds, and for illuminating the relationships between ecosystems and societies that depend on them. A major barrier to effective use of hist...
Main Author: | |
---|---|
Format: | Dataset |
Language: | unknown |
Published: |
Research Workspace
2015
|
Subjects: | |
Online Access: | https://search.dataone.org/view/10.24431_rw1k479_20201116T200748Z |
id |
dataone:10.24431_rw1k479_20201116T200748Z |
---|---|
record_format |
openpolar |
spelling |
dataone:10.24431_rw1k479_20201116T200748Z 2024-10-03T18:45:55+00:00 A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue Andy Mahoney 2015-06-01T00:00:00Z https://search.dataone.org/view/10.24431_rw1k479_20201116T200748Z unknown Research Workspace North Pacific Research Board Dataset 2015 dataone:urn:node:RW 2024-10-03T18:16:41Z Access to retrospective data is essential for understanding environmental variability and change, it is important for initializing and validating models of all kinds, and for illuminating the relationships between ecosystems and societies that depend on them. A major barrier to effective use of historical data in any discipline is the need to transform large quantities of manuscript or printed text, especially complex data tables, into formats that can be collated and analyzed by computers. However, there is currently no Optical Character Recognition (OCR) engine that can render scanned images of documents into digital text with a level of accuracy that renders human intervention unnecessary. This is especially true with respect to scientific data presented in tables or other matrix formats. The goal of this project was to build an open source citizen science mediated OCR module to facilitate transcription of complex data tables and other typescript or printed material (e.g Arctic and worldwide weather observations recorded in ship's logs), and integrate it into the Zooniverse transcription software bundle. This module will be available to the public via Zooniverse: https://www.zooniverse.org/projects/zooniverse/oldweather-ocr. Dataset Arctic Research Workspace (via DataONE) Arctic Pacific |
institution |
Open Polar |
collection |
Research Workspace (via DataONE) |
op_collection_id |
dataone:urn:node:RW |
language |
unknown |
topic |
North Pacific Research Board |
spellingShingle |
North Pacific Research Board Andy Mahoney A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue |
topic_facet |
North Pacific Research Board |
description |
Access to retrospective data is essential for understanding environmental variability and change, it is important for initializing and validating models of all kinds, and for illuminating the relationships between ecosystems and societies that depend on them. A major barrier to effective use of historical data in any discipline is the need to transform large quantities of manuscript or printed text, especially complex data tables, into formats that can be collated and analyzed by computers. However, there is currently no Optical Character Recognition (OCR) engine that can render scanned images of documents into digital text with a level of accuracy that renders human intervention unnecessary. This is especially true with respect to scientific data presented in tables or other matrix formats. The goal of this project was to build an open source citizen science mediated OCR module to facilitate transcription of complex data tables and other typescript or printed material (e.g Arctic and worldwide weather observations recorded in ship's logs), and integrate it into the Zooniverse transcription software bundle. This module will be available to the public via Zooniverse: https://www.zooniverse.org/projects/zooniverse/oldweather-ocr. |
format |
Dataset |
author |
Andy Mahoney |
author_facet |
Andy Mahoney |
author_sort |
Andy Mahoney |
title |
A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue |
title_short |
A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue |
title_full |
A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue |
title_fullStr |
A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue |
title_full_unstemmed |
A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue |
title_sort |
citizen science mediated optical character recognition (ocr) module for large-scale data rescue |
publisher |
Research Workspace |
publishDate |
2015 |
url |
https://search.dataone.org/view/10.24431_rw1k479_20201116T200748Z |
geographic |
Arctic Pacific |
geographic_facet |
Arctic Pacific |
genre |
Arctic |
genre_facet |
Arctic |
_version_ |
1811922158791688192 |