A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue
Access to retrospective data is essential for understanding environmental variability and change, it is important for initializing and validating models of all kinds, and for illuminating the relationships between ecosystems and societies that depend on them. A major barrier to effective use of hist...
Main Author: | |
---|---|
Format: | Dataset |
Language: | English |
Published: |
Axiom Data Science
2020
|
Subjects: | |
Online Access: | https://dx.doi.org/10.24431/rw1k479 https://search.dataone.org/#view/10.24431/rw1k479 |
id |
ftdatacite:10.24431/rw1k479 |
---|---|
record_format |
openpolar |
spelling |
ftdatacite:10.24431/rw1k479 2023-05-15T15:05:05+02:00 A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue Mahoney, Andy 2020 https://dx.doi.org/10.24431/rw1k479 https://search.dataone.org/#view/10.24431/rw1k479 en eng Axiom Data Science dataset Dataset 2020 ftdatacite https://doi.org/10.24431/rw1k479 2021-11-05T12:55:41Z Access to retrospective data is essential for understanding environmental variability and change, it is important for initializing and validating models of all kinds, and for illuminating the relationships between ecosystems and societies that depend on them. A major barrier to effective use of historical data in any discipline is the need to transform large quantities of manuscript or printed text, especially complex data tables, into formats that can be collated and analyzed by computers. However, there is currently no Optical Character Recognition (OCR) engine that can render scanned images of documents into digital text with a level of accuracy that renders human intervention unnecessary. This is especially true with respect to scientific data presented in tables or other matrix formats. The goal of this project was to build an open source citizen science mediated OCR module to facilitate transcription of complex data tables and other typescript or printed material (e.g Arctic and worldwide weather observations recorded in ship's logs), and integrate it into the Zooniverse transcription software bundle. This module will be available to the public via Zooniverse: https://www.zooniverse.org/projects/zooniverse/oldweather-ocr. Dataset Arctic DataCite Metadata Store (German National Library of Science and Technology) Arctic |
institution |
Open Polar |
collection |
DataCite Metadata Store (German National Library of Science and Technology) |
op_collection_id |
ftdatacite |
language |
English |
description |
Access to retrospective data is essential for understanding environmental variability and change, it is important for initializing and validating models of all kinds, and for illuminating the relationships between ecosystems and societies that depend on them. A major barrier to effective use of historical data in any discipline is the need to transform large quantities of manuscript or printed text, especially complex data tables, into formats that can be collated and analyzed by computers. However, there is currently no Optical Character Recognition (OCR) engine that can render scanned images of documents into digital text with a level of accuracy that renders human intervention unnecessary. This is especially true with respect to scientific data presented in tables or other matrix formats. The goal of this project was to build an open source citizen science mediated OCR module to facilitate transcription of complex data tables and other typescript or printed material (e.g Arctic and worldwide weather observations recorded in ship's logs), and integrate it into the Zooniverse transcription software bundle. This module will be available to the public via Zooniverse: https://www.zooniverse.org/projects/zooniverse/oldweather-ocr. |
format |
Dataset |
author |
Mahoney, Andy |
spellingShingle |
Mahoney, Andy A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue |
author_facet |
Mahoney, Andy |
author_sort |
Mahoney, Andy |
title |
A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue |
title_short |
A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue |
title_full |
A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue |
title_fullStr |
A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue |
title_full_unstemmed |
A citizen science mediated Optical Character Recognition (OCR) module for large-scale data rescue |
title_sort |
citizen science mediated optical character recognition (ocr) module for large-scale data rescue |
publisher |
Axiom Data Science |
publishDate |
2020 |
url |
https://dx.doi.org/10.24431/rw1k479 https://search.dataone.org/#view/10.24431/rw1k479 |
geographic |
Arctic |
geographic_facet |
Arctic |
genre |
Arctic |
genre_facet |
Arctic |
op_doi |
https://doi.org/10.24431/rw1k479 |
_version_ |
1766336835330506752 |