Community-based corpus-building: Three case studies
We describe three ongoing projects involving different First Peoples’ languages of Canada (Cree/nehiyawewin, Dene Sųłiné, and Nakoda/Stoney) that centre around the recording, transcription, compilation, and analysis of spontaneous oral language use––some narrative, some conversation––using freely av...
Main Authors: | , |
---|---|
Format: | Text |
Language: | unknown |
Published: |
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/10125/42052 |
id |
ftunivhawaiimano:oai:scholarspace.manoa.hawaii.edu:10125/42052 |
---|---|
record_format |
openpolar |
spelling |
ftunivhawaiimano:oai:scholarspace.manoa.hawaii.edu:10125/42052 2024-09-15T18:19:02+00:00 Community-based corpus-building: Three case studies Rice, Sally Thunder, Dorothy Rice, Sally Thunder, Dorothy 2017-03-03 application/pdf audio/mpeg http://hdl.handle.net/10125/42052 unknown http://hdl.handle.net/10125/42052 Text Sound 2017 ftunivhawaiimano 2024-08-06T23:39:42Z We describe three ongoing projects involving different First Peoples’ languages of Canada (Cree/nehiyawewin, Dene Sųłiné, and Nakoda/Stoney) that centre around the recording, transcription, compilation, and analysis of spontaneous oral language use––some narrative, some conversation––using freely available, Unicode-savvy corpus software (in this case, AntConc [Anthony 2014]) and little to no up- front annotation or translation into English. Because these languages are all polysynthetic, lemmatization and POS tagging are either unachievable or excessively time-draining and indeterminate activities. Nevertheless, corpus creation can still continue apace and reap huge benefits using the most basic of corpus tools. These projects are consonant with a growing ethos in language documentation circles that advocate for the value of corpus development alongside more traditional documentary activities (cf. McEnery & Ostler 2000, Woodbury 2003, Crowley 2007, Cox 2011, Mosel 2014, Vinogradov 2016). Each corpus is at a different stage of development, yet we hope to persuade community-based colleagues of the enormous benefits that ensue from the deliberate creation and use of a corpus of naturally occurring language data for language analysis and teaching. Direct benefits include ready-to-hand word lists; authentic sample utterances for exemplifying dictionaries, phrasebooks, and grammatical sketches; and a conscientious focus on recording many speakers across different demographic categories, discursive situations, and registers in order to achieve a broad range of usage conditions. A focus on wide and balanced sampling clearly strengthens the data pool from which analyses can follow. But it also results in a closer connection by speakers/learners to important and recurring phenomena in their language rather than to descriptions of phenomena that may have emerged through bilingual situations with a handful of speakers under the direct control of non-speaking linguists (who may have been guided by theoretical concerns ... Text Nakoda ScholarSpace at University of Hawaii at Manoa |
institution |
Open Polar |
collection |
ScholarSpace at University of Hawaii at Manoa |
op_collection_id |
ftunivhawaiimano |
language |
unknown |
description |
We describe three ongoing projects involving different First Peoples’ languages of Canada (Cree/nehiyawewin, Dene Sųłiné, and Nakoda/Stoney) that centre around the recording, transcription, compilation, and analysis of spontaneous oral language use––some narrative, some conversation––using freely available, Unicode-savvy corpus software (in this case, AntConc [Anthony 2014]) and little to no up- front annotation or translation into English. Because these languages are all polysynthetic, lemmatization and POS tagging are either unachievable or excessively time-draining and indeterminate activities. Nevertheless, corpus creation can still continue apace and reap huge benefits using the most basic of corpus tools. These projects are consonant with a growing ethos in language documentation circles that advocate for the value of corpus development alongside more traditional documentary activities (cf. McEnery & Ostler 2000, Woodbury 2003, Crowley 2007, Cox 2011, Mosel 2014, Vinogradov 2016). Each corpus is at a different stage of development, yet we hope to persuade community-based colleagues of the enormous benefits that ensue from the deliberate creation and use of a corpus of naturally occurring language data for language analysis and teaching. Direct benefits include ready-to-hand word lists; authentic sample utterances for exemplifying dictionaries, phrasebooks, and grammatical sketches; and a conscientious focus on recording many speakers across different demographic categories, discursive situations, and registers in order to achieve a broad range of usage conditions. A focus on wide and balanced sampling clearly strengthens the data pool from which analyses can follow. But it also results in a closer connection by speakers/learners to important and recurring phenomena in their language rather than to descriptions of phenomena that may have emerged through bilingual situations with a handful of speakers under the direct control of non-speaking linguists (who may have been guided by theoretical concerns ... |
author2 |
Rice, Sally Thunder, Dorothy |
format |
Text |
author |
Rice, Sally Thunder, Dorothy |
spellingShingle |
Rice, Sally Thunder, Dorothy Community-based corpus-building: Three case studies |
author_facet |
Rice, Sally Thunder, Dorothy |
author_sort |
Rice, Sally |
title |
Community-based corpus-building: Three case studies |
title_short |
Community-based corpus-building: Three case studies |
title_full |
Community-based corpus-building: Three case studies |
title_fullStr |
Community-based corpus-building: Three case studies |
title_full_unstemmed |
Community-based corpus-building: Three case studies |
title_sort |
community-based corpus-building: three case studies |
publishDate |
2017 |
url |
http://hdl.handle.net/10125/42052 |
genre |
Nakoda |
genre_facet |
Nakoda |
op_relation |
http://hdl.handle.net/10125/42052 |
_version_ |
1810457138552635392 |