Principles and Practicalities of Corpus Design in Language Retrieval: Issues in the Digitization of the Beynon Corpus of Early Twentieth-Century Sm’algyax Materials

This paper describes a pilot project to develop a machine-readable corpus of early twentieth-century Sm’algyax texts from a large collection of handwritten manuscripts collected by the Tsimshian ethnographer and chief William Beynon. The project seeks to ensure that the materials produced are maxima...

Full description

Bibliographic Details
Main Authors: Stebbins, Tonya N., Hellwig, Birgit
Format: Article in Journal/Newspaper
Language:English
Published: University of Hawai'i Press 2010
Subjects:
Online Access:http://hdl.handle.net/10125/4466
id ftunivhawaiimano:oai:scholarspace.manoa.hawaii.edu:10125/4466
record_format openpolar
spelling ftunivhawaiimano:oai:scholarspace.manoa.hawaii.edu:10125/4466 2023-05-15T18:39:26+02:00 Principles and Practicalities of Corpus Design in Language Retrieval: Issues in the Digitization of the Beynon Corpus of Early Twentieth-Century Sm’algyax Materials Stebbins, Tonya N. Hellwig, Birgit 2010 26 pages application/pdf http://hdl.handle.net/10125/4466 eng eng University of Hawai'i Press Stebbins, Tonya N. and Birgit Hellwig. 2010. Principles and Practicalities of Corpus Design in Language Retrieval: Issues in the Digitization of the Beynon Corpus of Early Twentieth-Century Sm’algyax Materials. Language Documentation & Conservation 4. 34-59. 1934-5275 http://hdl.handle.net/10125/4466 Creative Commons Attribution Non-Commercial No Derivatives License Attribution Non-Commercial No Derivatives by-nc-nd-nsa CC-BY-NC-ND corpus Sm'algyax William Beynon Tsimshian Article Text 2010 ftunivhawaiimano 2022-07-17T13:28:54Z This paper describes a pilot project to develop a machine-readable corpus of early twentieth-century Sm’algyax texts from a large collection of handwritten manuscripts collected by the Tsimshian ethnographer and chief William Beynon. The project seeks to ensure that the materials produced are maximally accessible to the Tsimshian community. It relates established principles for corpus design to practical issues in language retrieval, recognizing that the corpus will likely function as an intermediate stage between the original manuscripts and any language materials developed by the community. The paper is addressed primarily to linguists working on language retrieval projects but may also be of use to communities who are working with linguists, as it provides insight into the concerns and preoccupations that linguists bring to such tasks. National Foreign Language Resource Center Article in Journal/Newspaper Tsimshian Tsimshian* ScholarSpace at University of Hawaii at Manoa
institution Open Polar
collection ScholarSpace at University of Hawaii at Manoa
op_collection_id ftunivhawaiimano
language English
topic corpus
Sm'algyax
William Beynon
Tsimshian
spellingShingle corpus
Sm'algyax
William Beynon
Tsimshian
Stebbins, Tonya N.
Hellwig, Birgit
Principles and Practicalities of Corpus Design in Language Retrieval: Issues in the Digitization of the Beynon Corpus of Early Twentieth-Century Sm’algyax Materials
topic_facet corpus
Sm'algyax
William Beynon
Tsimshian
description This paper describes a pilot project to develop a machine-readable corpus of early twentieth-century Sm’algyax texts from a large collection of handwritten manuscripts collected by the Tsimshian ethnographer and chief William Beynon. The project seeks to ensure that the materials produced are maximally accessible to the Tsimshian community. It relates established principles for corpus design to practical issues in language retrieval, recognizing that the corpus will likely function as an intermediate stage between the original manuscripts and any language materials developed by the community. The paper is addressed primarily to linguists working on language retrieval projects but may also be of use to communities who are working with linguists, as it provides insight into the concerns and preoccupations that linguists bring to such tasks. National Foreign Language Resource Center
format Article in Journal/Newspaper
author Stebbins, Tonya N.
Hellwig, Birgit
author_facet Stebbins, Tonya N.
Hellwig, Birgit
author_sort Stebbins, Tonya N.
title Principles and Practicalities of Corpus Design in Language Retrieval: Issues in the Digitization of the Beynon Corpus of Early Twentieth-Century Sm’algyax Materials
title_short Principles and Practicalities of Corpus Design in Language Retrieval: Issues in the Digitization of the Beynon Corpus of Early Twentieth-Century Sm’algyax Materials
title_full Principles and Practicalities of Corpus Design in Language Retrieval: Issues in the Digitization of the Beynon Corpus of Early Twentieth-Century Sm’algyax Materials
title_fullStr Principles and Practicalities of Corpus Design in Language Retrieval: Issues in the Digitization of the Beynon Corpus of Early Twentieth-Century Sm’algyax Materials
title_full_unstemmed Principles and Practicalities of Corpus Design in Language Retrieval: Issues in the Digitization of the Beynon Corpus of Early Twentieth-Century Sm’algyax Materials
title_sort principles and practicalities of corpus design in language retrieval: issues in the digitization of the beynon corpus of early twentieth-century sm’algyax materials
publisher University of Hawai'i Press
publishDate 2010
url http://hdl.handle.net/10125/4466
genre Tsimshian
Tsimshian*
genre_facet Tsimshian
Tsimshian*
op_relation Stebbins, Tonya N. and Birgit Hellwig. 2010. Principles and Practicalities of Corpus Design in Language Retrieval: Issues in the Digitization of the Beynon Corpus of Early Twentieth-Century Sm’algyax Materials. Language Documentation & Conservation 4. 34-59.
1934-5275
http://hdl.handle.net/10125/4466
op_rights Creative Commons Attribution Non-Commercial No Derivatives License
Attribution Non-Commercial No Derivatives
by-nc-nd-nsa
op_rightsnorm CC-BY-NC-ND
_version_ 1766228325268717568