Approaches to digitization and annotation: A survey of language documentation materials in the Alaska Native Language Center Archive

This paper describes the structure of existing documentary texts and recordings, based on an informal survey of items in the Alaska Native Language Center (ANLC) archive. Knowledge of data in existing language archives is important in at least two respects. First, any markup or encoding schemes will...

Full description

Bibliographic Details
Main Authors: Gary Holton University, Gary Holton
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Published: 2003
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2879
http://emeld.org/workshop/2003/paper-holton.pdf
Description
Summary:This paper describes the structure of existing documentary texts and recordings, based on an informal survey of items in the Alaska Native Language Center (ANLC) archive. Knowledge of data in existing language archives is important in at least two respects. First, any markup or encoding schemes will need to be robust enough to handle existing data, so prior knowledge of some of the legacy data will assist with the eventual encoding of those data. Second, and perhaps more important, knowledge of existing data provides indirect but empirically valid insight into the way in which linguists approach language documentation. Theorizing about the structure of language documentation materials may unwittingly lead us to examine idealized models of language documentation. An example of such a model is the oft-cited three-line gloss, a text-encoding format which is in practice much more diverse than might at first appear. Examining existing data permits the development of best practice to be grounded in what field linguists actually do, rather than what we think they do. The ANLC archive contains approximately 10,000 paper documents and 5,000 recordings comprising nearly everything written in or about Alaska Native languages (cf. Krauss & McGary 1980). The archive also contains substantial holdings of materials on related languages spoken outside Alaska. Admittedly, the archive still lacks geographic breadth, in the sense that it does not represent a typologically broad sample of the world's languages. The majority of Alaska's Native languages fall into one of two families: Eskimo-Aleut and Athabaskan-Eyak-Tlingit. However, the time depth of the materials and the comprehensive nature of the coverage ensure that the archive is representative of a broad range of linguistic tradition.