Approaches to digitization and annotation: A survey of language documentation materials in the Alaska Native Language Center Archive
This paper describes the structure of existing documentary texts and recordings, based on an informal survey of items in the Alaska Native Language Center (ANLC) archive. Knowledge of data in existing language archives is important in at least two respects. First, any markup or encoding schemes will...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Text |
Language: | English |
Published: |
2003
|
Subjects: | |
Online Access: | http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2879 http://emeld.org/workshop/2003/paper-holton.pdf |
id |
ftciteseerx:oai:CiteSeerX.psu:10.1.1.1.2879 |
---|---|
record_format |
openpolar |
spelling |
ftciteseerx:oai:CiteSeerX.psu:10.1.1.1.2879 2023-05-15T13:14:32+02:00 Approaches to digitization and annotation: A survey of language documentation materials in the Alaska Native Language Center Archive Gary Holton University Gary Holton The Pennsylvania State University CiteSeerX Archives 2003 application/pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2879 http://emeld.org/workshop/2003/paper-holton.pdf en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2879 http://emeld.org/workshop/2003/paper-holton.pdf Metadata may be used without restrictions as long as the oai identifier remains attached to it. http://emeld.org/workshop/2003/paper-holton.pdf text 2003 ftciteseerx 2016-01-07T13:07:50Z This paper describes the structure of existing documentary texts and recordings, based on an informal survey of items in the Alaska Native Language Center (ANLC) archive. Knowledge of data in existing language archives is important in at least two respects. First, any markup or encoding schemes will need to be robust enough to handle existing data, so prior knowledge of some of the legacy data will assist with the eventual encoding of those data. Second, and perhaps more important, knowledge of existing data provides indirect but empirically valid insight into the way in which linguists approach language documentation. Theorizing about the structure of language documentation materials may unwittingly lead us to examine idealized models of language documentation. An example of such a model is the oft-cited three-line gloss, a text-encoding format which is in practice much more diverse than might at first appear. Examining existing data permits the development of best practice to be grounded in what field linguists actually do, rather than what we think they do. The ANLC archive contains approximately 10,000 paper documents and 5,000 recordings comprising nearly everything written in or about Alaska Native languages (cf. Krauss & McGary 1980). The archive also contains substantial holdings of materials on related languages spoken outside Alaska. Admittedly, the archive still lacks geographic breadth, in the sense that it does not represent a typologically broad sample of the world's languages. The majority of Alaska's Native languages fall into one of two families: Eskimo-Aleut and Athabaskan-Eyak-Tlingit. However, the time depth of the materials and the comprehensive nature of the coverage ensure that the archive is representative of a broad range of linguistic tradition. Text aleut eskimo* Eskimo–Aleut eyak tlingit Alaska Unknown |
institution |
Open Polar |
collection |
Unknown |
op_collection_id |
ftciteseerx |
language |
English |
description |
This paper describes the structure of existing documentary texts and recordings, based on an informal survey of items in the Alaska Native Language Center (ANLC) archive. Knowledge of data in existing language archives is important in at least two respects. First, any markup or encoding schemes will need to be robust enough to handle existing data, so prior knowledge of some of the legacy data will assist with the eventual encoding of those data. Second, and perhaps more important, knowledge of existing data provides indirect but empirically valid insight into the way in which linguists approach language documentation. Theorizing about the structure of language documentation materials may unwittingly lead us to examine idealized models of language documentation. An example of such a model is the oft-cited three-line gloss, a text-encoding format which is in practice much more diverse than might at first appear. Examining existing data permits the development of best practice to be grounded in what field linguists actually do, rather than what we think they do. The ANLC archive contains approximately 10,000 paper documents and 5,000 recordings comprising nearly everything written in or about Alaska Native languages (cf. Krauss & McGary 1980). The archive also contains substantial holdings of materials on related languages spoken outside Alaska. Admittedly, the archive still lacks geographic breadth, in the sense that it does not represent a typologically broad sample of the world's languages. The majority of Alaska's Native languages fall into one of two families: Eskimo-Aleut and Athabaskan-Eyak-Tlingit. However, the time depth of the materials and the comprehensive nature of the coverage ensure that the archive is representative of a broad range of linguistic tradition. |
author2 |
The Pennsylvania State University CiteSeerX Archives |
format |
Text |
author |
Gary Holton University Gary Holton |
spellingShingle |
Gary Holton University Gary Holton Approaches to digitization and annotation: A survey of language documentation materials in the Alaska Native Language Center Archive |
author_facet |
Gary Holton University Gary Holton |
author_sort |
Gary Holton University |
title |
Approaches to digitization and annotation: A survey of language documentation materials in the Alaska Native Language Center Archive |
title_short |
Approaches to digitization and annotation: A survey of language documentation materials in the Alaska Native Language Center Archive |
title_full |
Approaches to digitization and annotation: A survey of language documentation materials in the Alaska Native Language Center Archive |
title_fullStr |
Approaches to digitization and annotation: A survey of language documentation materials in the Alaska Native Language Center Archive |
title_full_unstemmed |
Approaches to digitization and annotation: A survey of language documentation materials in the Alaska Native Language Center Archive |
title_sort |
approaches to digitization and annotation: a survey of language documentation materials in the alaska native language center archive |
publishDate |
2003 |
url |
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2879 http://emeld.org/workshop/2003/paper-holton.pdf |
genre |
aleut eskimo* Eskimo–Aleut eyak tlingit Alaska |
genre_facet |
aleut eskimo* Eskimo–Aleut eyak tlingit Alaska |
op_source |
http://emeld.org/workshop/2003/paper-holton.pdf |
op_relation |
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2879 http://emeld.org/workshop/2003/paper-holton.pdf |
op_rights |
Metadata may be used without restrictions as long as the oai identifier remains attached to it. |
_version_ |
1766264059584315392 |