Automating Historical Source Transcription

Transcribing the 1950 Norwegian census with 3.3 million person records and linking it to the Central Population Register (CPR) provides longitudinal information about significant population groups during the understudied period of the mid-20th century. Since this source is closed to the public, we r...

Full description

Bibliographic Details
Published in:Historical Life Course Studies
Main Author: Thorvaldsen, G.
Format: Article in Journal/Newspaper
Language:English
Published: European Historical Population Samples Network (EHPS-Net) 2022
Subjects:
Online Access:http://elar.urfu.ru/handle/10995/131208
https://hlcs.nl/article/download/9568/10093
https://doi.org/10.51964/hlcs9568
id fturalfuniv:oai:elar.urfu.ru:10995/131208
record_format openpolar
spelling fturalfuniv:oai:elar.urfu.ru:10995/131208 2024-09-15T18:39:28+00:00 Automating Historical Source Transcription Thorvaldsen, G. 2022 application/pdf http://elar.urfu.ru/handle/10995/131208 https://hlcs.nl/article/download/9568/10093 https://doi.org/10.51964/hlcs9568 en eng European Historical Population Samples Network (EHPS-Net) Thorvaldsen, G 2021, 'Automating Historical Source Transcription', Historical Life Course Studies, Том. 10, стр. 59-63. https://doi.org/10.51964/hlcs9568 Thorvaldsen, G. (2021). Automating Historical Source Transcription. Historical Life Course Studies, 10, 59-63. https://doi.org/10.51964/hlcs9568 2352-6343 Final All Open Access; Gold Open Access; Green Open Access https://hlcs.nl/article/download/9568/10093 http://elar.urfu.ru/handle/10995/131208 doi:10.51964/hlcs9568 85170375792 info:eu-repo/semantics/openAccess cc-by https://creativecommons.org/licenses/by/4.0/ Historical Life Course Studies CENSUS MACHINE LEARNING POPULATION REGISTER TRANSCRIPTION Article info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion 2022 fturalfuniv https://doi.org/10.51964/hlcs9568 2024-07-08T03:56:19Z Transcribing the 1950 Norwegian census with 3.3 million person records and linking it to the Central Population Register (CPR) provides longitudinal information about significant population groups during the understudied period of the mid-20th century. Since this source is closed to the public, we receive no help from genealogists and rather use machine learning techniques to semi-automate the transcription. First the scanned manuscripts are split into individual cells and multiple names are divided. After the birthdates were transcribed manually in India, a lookup routine searches for families with matching sets of birthdates in the 1960 census and the CPR. After manual checks with GUI routines, the names are copied to the text version of the 1950 census, also storing the links to the CPR. Other fields like occupations or gender contain numeric or letter codes and are transcribed wholesale with routines interpreting the layout of the graphical images. Work employing these methods has also started on the 1930 census, which is the last of the Norwegian censuses to be transcribed. © 2021, Thorvaldsen. International Institute of Social History Amsterdam Lars Ailo Ballo Scientific Research Network of Historical Demography National Institutes of Health, NIH Universitetet i Tromsø, UiT European Science Foundation, ESF International Institute of Social History, IISH Fonds Wetenschappelijk Onderzoek, FWO Norges Forskningsråd, (225950) Funding text 1: Historical Life Course Studies is a no-fee double-blind, peer-reviewed open-access journal supported by the European Science Foundation (ESF, http://www.esf.org), the Scientific Research Network of Historical Demography (FWO Flanders, http://www.historicaldemography.be) and the International Institute of Social History Amsterdam (IISH, Funding text 2: This paper is written with input from Kåre Bævre (National Institute of Health), Lars Holden (Norwegian Computing Center), Trygve Andersen (UiT) and Lars Ailo Ballo (UiT). Supported financially by the ... Article in Journal/Newspaper Tromsø Universitetet i Tromsø Ural Federal University (URFU): ELAR Historical Life Course Studies 10 59 63
institution Open Polar
collection Ural Federal University (URFU): ELAR
op_collection_id fturalfuniv
language English
topic CENSUS
MACHINE LEARNING
POPULATION REGISTER
TRANSCRIPTION
spellingShingle CENSUS
MACHINE LEARNING
POPULATION REGISTER
TRANSCRIPTION
Thorvaldsen, G.
Automating Historical Source Transcription
topic_facet CENSUS
MACHINE LEARNING
POPULATION REGISTER
TRANSCRIPTION
description Transcribing the 1950 Norwegian census with 3.3 million person records and linking it to the Central Population Register (CPR) provides longitudinal information about significant population groups during the understudied period of the mid-20th century. Since this source is closed to the public, we receive no help from genealogists and rather use machine learning techniques to semi-automate the transcription. First the scanned manuscripts are split into individual cells and multiple names are divided. After the birthdates were transcribed manually in India, a lookup routine searches for families with matching sets of birthdates in the 1960 census and the CPR. After manual checks with GUI routines, the names are copied to the text version of the 1950 census, also storing the links to the CPR. Other fields like occupations or gender contain numeric or letter codes and are transcribed wholesale with routines interpreting the layout of the graphical images. Work employing these methods has also started on the 1930 census, which is the last of the Norwegian censuses to be transcribed. © 2021, Thorvaldsen. International Institute of Social History Amsterdam Lars Ailo Ballo Scientific Research Network of Historical Demography National Institutes of Health, NIH Universitetet i Tromsø, UiT European Science Foundation, ESF International Institute of Social History, IISH Fonds Wetenschappelijk Onderzoek, FWO Norges Forskningsråd, (225950) Funding text 1: Historical Life Course Studies is a no-fee double-blind, peer-reviewed open-access journal supported by the European Science Foundation (ESF, http://www.esf.org), the Scientific Research Network of Historical Demography (FWO Flanders, http://www.historicaldemography.be) and the International Institute of Social History Amsterdam (IISH, Funding text 2: This paper is written with input from Kåre Bævre (National Institute of Health), Lars Holden (Norwegian Computing Center), Trygve Andersen (UiT) and Lars Ailo Ballo (UiT). Supported financially by the ...
format Article in Journal/Newspaper
author Thorvaldsen, G.
author_facet Thorvaldsen, G.
author_sort Thorvaldsen, G.
title Automating Historical Source Transcription
title_short Automating Historical Source Transcription
title_full Automating Historical Source Transcription
title_fullStr Automating Historical Source Transcription
title_full_unstemmed Automating Historical Source Transcription
title_sort automating historical source transcription
publisher European Historical Population Samples Network (EHPS-Net)
publishDate 2022
url http://elar.urfu.ru/handle/10995/131208
https://hlcs.nl/article/download/9568/10093
https://doi.org/10.51964/hlcs9568
genre Tromsø
Universitetet i Tromsø
genre_facet Tromsø
Universitetet i Tromsø
op_source Historical Life Course Studies
op_relation Thorvaldsen, G 2021, 'Automating Historical Source Transcription', Historical Life Course Studies, Том. 10, стр. 59-63. https://doi.org/10.51964/hlcs9568
Thorvaldsen, G. (2021). Automating Historical Source Transcription. Historical Life Course Studies, 10, 59-63. https://doi.org/10.51964/hlcs9568
2352-6343
Final
All Open Access; Gold Open Access; Green Open Access
https://hlcs.nl/article/download/9568/10093
http://elar.urfu.ru/handle/10995/131208
doi:10.51964/hlcs9568
85170375792
op_rights info:eu-repo/semantics/openAccess
cc-by
https://creativecommons.org/licenses/by/4.0/
op_doi https://doi.org/10.51964/hlcs9568
container_title Historical Life Course Studies
container_volume 10
container_start_page 59
op_container_end_page 63
_version_ 1810483834694664192