Approaches to digitization and annotation: A survey of language documentation materials in the Alaska Native Language Center Archive
This paper describes the structure of existing documentary texts and recordings, based on an informal survey of items in the Alaska Native Language Center (ANLC) archive. Knowledge of data in existing language archives is important in at least two respects. First, any markup or encoding schemes will...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Text |
Language: | English |
Published: |
2003
|
Subjects: | |
Online Access: | http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2879 http://emeld.org/workshop/2003/paper-holton.pdf |
Summary: | This paper describes the structure of existing documentary texts and recordings, based on an informal survey of items in the Alaska Native Language Center (ANLC) archive. Knowledge of data in existing language archives is important in at least two respects. First, any markup or encoding schemes will need to be robust enough to handle existing data, so prior knowledge of some of the legacy data will assist with the eventual encoding of those data. Second, and perhaps more important, knowledge of existing data provides indirect but empirically valid insight into the way in which linguists approach language documentation. Theorizing about the structure of language documentation materials may unwittingly lead us to examine idealized models of language documentation. An example of such a model is the oft-cited three-line gloss, a text-encoding format which is in practice much more diverse than might at first appear. Examining existing data permits the development of best practice to be grounded in what field linguists actually do, rather than what we think they do. The ANLC archive contains approximately 10,000 paper documents and 5,000 recordings comprising nearly everything written in or about Alaska Native languages (cf. Krauss & McGary 1980). The archive also contains substantial holdings of materials on related languages spoken outside Alaska. Admittedly, the archive still lacks geographic breadth, in the sense that it does not represent a typologically broad sample of the world's languages. The majority of Alaska's Native languages fall into one of two families: Eskimo-Aleut and Athabaskan-Eyak-Tlingit. However, the time depth of the materials and the comprehensive nature of the coverage ensure that the archive is representative of a broad range of linguistic tradition. |
---|