Unsupervised part-of-speech induction for language description: Modeling documentation materials in Kolyma Yukaghir ...

This study investigates the clustering of words into Part-of-Speech (POS) classes in Kolyma Yukaghir. In grammatical descriptions, lexical items are assigned to POS classes based on their morphological paradigms. Discursively, however, these classes share a fair amount of morphology. In this study,...

Full description

Bibliographic Details
Main Authors: Association for Computational Linguistics 2023, Roll, Nathan, Todd, Simon, Ventayol-Boada, Albert
Format: Article in Journal/Newspaper
Language:unknown
Published: Underline Science Inc. 2023
Subjects:
Online Access:https://dx.doi.org/10.48448/awrh-mn93
https://underline.io/lecture/72066-unsupervised-part-of-speech-induction-for-language-description-modeling-documentation-materials-in-kolyma-yukaghir
Description
Summary:This study investigates the clustering of words into Part-of-Speech (POS) classes in Kolyma Yukaghir. In grammatical descriptions, lexical items are assigned to POS classes based on their morphological paradigms. Discursively, however, these classes share a fair amount of morphology. In this study, we turn to POS induction to evaluate if classes based on quantification of the distributions in which roots and affixes are used can be useful for language description purposes, and, if so, what those classes might be. We qualitatively compare clusters of roots and affixes based on four different definitions of their distributions. The results show that clustering is more reliable for words that typically bear more morphology. Additionally, the results suggest that the number of POS classes in Kolyma Yukaghir might be smaller than stated in current descriptions. This study thus demonstrates how unsupervised learning methods can provide insights for language description, particularly for highly inflectional ...