A Digital Corpus of St. Lawrence Island Yupik
St. Lawrence Island Yupik (ISO 639-3: ess) is an endangered polysynthetic language in the Inuit-Yupik language family indigenous to Alaska and Chukotka. This work presents a step-by-step pipeline for the digitization of written texts, and the first publicly available digital corpus for St. Lawrence...
Published in: | Proceedings of the Workshop on Computational Methods for Endangered Languages |
---|---|
Main Authors: | , , , , |
Format: | Article in Journal/Newspaper |
Language: | English |
Published: |
Proceedings of the Workshop on Computational Methods for Endangered Languages
2021
|
Subjects: | |
Online Access: | https://journals.colorado.edu/index.php/computel/article/view/985 https://doi.org/10.33011/computel.v2i.985 |
id |
ftucoloradobould:oai:journals.colorado.edu:article/985 |
---|---|
record_format |
openpolar |
spelling |
ftucoloradobould:oai:journals.colorado.edu:article/985 2023-05-15T15:54:49+02:00 A Digital Corpus of St. Lawrence Island Yupik Schwartz, Lane Chen, Emily M. Park, Hyunji Hayley Jahn, Edward Schreiner, Sylvia L.R. 2021-03-02 application/pdf https://journals.colorado.edu/index.php/computel/article/view/985 https://doi.org/10.33011/computel.v2i.985 eng eng Proceedings of the Workshop on Computational Methods for Endangered Languages https://journals.colorado.edu/index.php/computel/article/view/985/911 https://journals.colorado.edu/index.php/computel/article/view/985 doi:10.33011/computel.v2i.985 Proceedings of the Workshop on Computational Methods for Endangered Languages; Vol. 2 (2021): Proceedings of the 4th Workshop on Computational Methods for Endangered Languages (Resource Papers and Extended Abstracts); 31-40 info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion Extended abstract and resource paper 2021 ftucoloradobould https://doi.org/10.33011/computel.v2i.985 2022-10-18T09:18:49Z St. Lawrence Island Yupik (ISO 639-3: ess) is an endangered polysynthetic language in the Inuit-Yupik language family indigenous to Alaska and Chukotka. This work presents a step-by-step pipeline for the digitization of written texts, and the first publicly available digital corpus for St. Lawrence Island Yupik, created using that pipeline. This corpus has great potential for future linguistic inquiry and research in NLP. It was also developed for use in Yupik language education and revitalization, with a primary goal of enabling easy access to Yupik texts by educators and by members of the Yupik community. A secondary goal is to support development of language technology such as spell-checkers, text- completion systems, interactive e-books, and language learning apps for use by the Yupik community. Article in Journal/Newspaper Chukotka inuit Inuit–Yupik St Lawrence Island St. Lawrence Island Yupik Yupik Alaska University of Colorado Boulder Open Journals Lawrence Island ENVELOPE(-103.718,-103.718,56.967,56.967) Proceedings of the Workshop on Computational Methods for Endangered Languages 2 2 |
institution |
Open Polar |
collection |
University of Colorado Boulder Open Journals |
op_collection_id |
ftucoloradobould |
language |
English |
description |
St. Lawrence Island Yupik (ISO 639-3: ess) is an endangered polysynthetic language in the Inuit-Yupik language family indigenous to Alaska and Chukotka. This work presents a step-by-step pipeline for the digitization of written texts, and the first publicly available digital corpus for St. Lawrence Island Yupik, created using that pipeline. This corpus has great potential for future linguistic inquiry and research in NLP. It was also developed for use in Yupik language education and revitalization, with a primary goal of enabling easy access to Yupik texts by educators and by members of the Yupik community. A secondary goal is to support development of language technology such as spell-checkers, text- completion systems, interactive e-books, and language learning apps for use by the Yupik community. |
format |
Article in Journal/Newspaper |
author |
Schwartz, Lane Chen, Emily M. Park, Hyunji Hayley Jahn, Edward Schreiner, Sylvia L.R. |
spellingShingle |
Schwartz, Lane Chen, Emily M. Park, Hyunji Hayley Jahn, Edward Schreiner, Sylvia L.R. A Digital Corpus of St. Lawrence Island Yupik |
author_facet |
Schwartz, Lane Chen, Emily M. Park, Hyunji Hayley Jahn, Edward Schreiner, Sylvia L.R. |
author_sort |
Schwartz, Lane |
title |
A Digital Corpus of St. Lawrence Island Yupik |
title_short |
A Digital Corpus of St. Lawrence Island Yupik |
title_full |
A Digital Corpus of St. Lawrence Island Yupik |
title_fullStr |
A Digital Corpus of St. Lawrence Island Yupik |
title_full_unstemmed |
A Digital Corpus of St. Lawrence Island Yupik |
title_sort |
digital corpus of st. lawrence island yupik |
publisher |
Proceedings of the Workshop on Computational Methods for Endangered Languages |
publishDate |
2021 |
url |
https://journals.colorado.edu/index.php/computel/article/view/985 https://doi.org/10.33011/computel.v2i.985 |
long_lat |
ENVELOPE(-103.718,-103.718,56.967,56.967) |
geographic |
Lawrence Island |
geographic_facet |
Lawrence Island |
genre |
Chukotka inuit Inuit–Yupik St Lawrence Island St. Lawrence Island Yupik Yupik Alaska |
genre_facet |
Chukotka inuit Inuit–Yupik St Lawrence Island St. Lawrence Island Yupik Yupik Alaska |
op_source |
Proceedings of the Workshop on Computational Methods for Endangered Languages; Vol. 2 (2021): Proceedings of the 4th Workshop on Computational Methods for Endangered Languages (Resource Papers and Extended Abstracts); 31-40 |
op_relation |
https://journals.colorado.edu/index.php/computel/article/view/985/911 https://journals.colorado.edu/index.php/computel/article/view/985 doi:10.33011/computel.v2i.985 |
op_doi |
https://doi.org/10.33011/computel.v2i.985 |
container_title |
Proceedings of the Workshop on Computational Methods for Endangered Languages |
container_volume |
2 |
container_issue |
2 |
_version_ |
1766390060846940160 |