Unsupervised part-of-speech induction for language description: Modeling documentation materials in Kolyma Yukaghir ...
This study investigates the clustering of words into Part-of-Speech (POS) classes in Kolyma Yukaghir. In grammatical descriptions, lexical items are assigned to POS classes based on their morphological paradigms. Discursively, however, these classes share a fair amount of morphology. In this study,...
Main Authors: | , , , |
---|---|
Format: | Article in Journal/Newspaper |
Language: | unknown |
Published: |
Underline Science Inc.
2023
|
Subjects: | |
Online Access: | https://dx.doi.org/10.48448/awrh-mn93 https://underline.io/lecture/72066-unsupervised-part-of-speech-induction-for-language-description-modeling-documentation-materials-in-kolyma-yukaghir |
id |
ftdatacite:10.48448/awrh-mn93 |
---|---|
record_format |
openpolar |
spelling |
ftdatacite:10.48448/awrh-mn93 2023-10-01T03:57:12+02:00 Unsupervised part-of-speech induction for language description: Modeling documentation materials in Kolyma Yukaghir ... Association for Computational Linguistics 2023 Roll, Nathan Todd, Simon Ventayol-Boada, Albert 2023 https://dx.doi.org/10.48448/awrh-mn93 https://underline.io/lecture/72066-unsupervised-part-of-speech-induction-for-language-description-modeling-documentation-materials-in-kolyma-yukaghir unknown Underline Science Inc. Natural Language Processing Language Models Machine Learning article MediaObject Conference talk Audiovisual 2023 ftdatacite https://doi.org/10.48448/awrh-mn93 2023-09-04T14:47:50Z This study investigates the clustering of words into Part-of-Speech (POS) classes in Kolyma Yukaghir. In grammatical descriptions, lexical items are assigned to POS classes based on their morphological paradigms. Discursively, however, these classes share a fair amount of morphology. In this study, we turn to POS induction to evaluate if classes based on quantification of the distributions in which roots and affixes are used can be useful for language description purposes, and, if so, what those classes might be. We qualitatively compare clusters of roots and affixes based on four different definitions of their distributions. The results show that clustering is more reliable for words that typically bear more morphology. Additionally, the results suggest that the number of POS classes in Kolyma Yukaghir might be smaller than stated in current descriptions. This study thus demonstrates how unsupervised learning methods can provide insights for language description, particularly for highly inflectional ... Article in Journal/Newspaper Kolyma Yukaghir Yukaghir DataCite Metadata Store (German National Library of Science and Technology) Kolyma ENVELOPE(161.000,161.000,69.500,69.500) |
institution |
Open Polar |
collection |
DataCite Metadata Store (German National Library of Science and Technology) |
op_collection_id |
ftdatacite |
language |
unknown |
topic |
Natural Language Processing Language Models Machine Learning |
spellingShingle |
Natural Language Processing Language Models Machine Learning Association for Computational Linguistics 2023 Roll, Nathan Todd, Simon Ventayol-Boada, Albert Unsupervised part-of-speech induction for language description: Modeling documentation materials in Kolyma Yukaghir ... |
topic_facet |
Natural Language Processing Language Models Machine Learning |
description |
This study investigates the clustering of words into Part-of-Speech (POS) classes in Kolyma Yukaghir. In grammatical descriptions, lexical items are assigned to POS classes based on their morphological paradigms. Discursively, however, these classes share a fair amount of morphology. In this study, we turn to POS induction to evaluate if classes based on quantification of the distributions in which roots and affixes are used can be useful for language description purposes, and, if so, what those classes might be. We qualitatively compare clusters of roots and affixes based on four different definitions of their distributions. The results show that clustering is more reliable for words that typically bear more morphology. Additionally, the results suggest that the number of POS classes in Kolyma Yukaghir might be smaller than stated in current descriptions. This study thus demonstrates how unsupervised learning methods can provide insights for language description, particularly for highly inflectional ... |
format |
Article in Journal/Newspaper |
author |
Association for Computational Linguistics 2023 Roll, Nathan Todd, Simon Ventayol-Boada, Albert |
author_facet |
Association for Computational Linguistics 2023 Roll, Nathan Todd, Simon Ventayol-Boada, Albert |
author_sort |
Association for Computational Linguistics 2023 |
title |
Unsupervised part-of-speech induction for language description: Modeling documentation materials in Kolyma Yukaghir ... |
title_short |
Unsupervised part-of-speech induction for language description: Modeling documentation materials in Kolyma Yukaghir ... |
title_full |
Unsupervised part-of-speech induction for language description: Modeling documentation materials in Kolyma Yukaghir ... |
title_fullStr |
Unsupervised part-of-speech induction for language description: Modeling documentation materials in Kolyma Yukaghir ... |
title_full_unstemmed |
Unsupervised part-of-speech induction for language description: Modeling documentation materials in Kolyma Yukaghir ... |
title_sort |
unsupervised part-of-speech induction for language description: modeling documentation materials in kolyma yukaghir ... |
publisher |
Underline Science Inc. |
publishDate |
2023 |
url |
https://dx.doi.org/10.48448/awrh-mn93 https://underline.io/lecture/72066-unsupervised-part-of-speech-induction-for-language-description-modeling-documentation-materials-in-kolyma-yukaghir |
long_lat |
ENVELOPE(161.000,161.000,69.500,69.500) |
geographic |
Kolyma |
geographic_facet |
Kolyma |
genre |
Kolyma Yukaghir Yukaghir |
genre_facet |
Kolyma Yukaghir Yukaghir |
op_doi |
https://doi.org/10.48448/awrh-mn93 |
_version_ |
1778528285288824832 |