Unsupervised part-of-speech induction for language description: Modeling documentation materials in Kolyma Yukaghir ...

This study investigates the clustering of words into Part-of-Speech (POS) classes in Kolyma Yukaghir. In grammatical descriptions, lexical items are assigned to POS classes based on their morphological paradigms. Discursively, however, these classes share a fair amount of morphology. In this study,...

Full description

Bibliographic Details
Main Authors: Association for Computational Linguistics 2023, Roll, Nathan, Todd, Simon, Ventayol-Boada, Albert
Format: Article in Journal/Newspaper
Language:unknown
Published: Underline Science Inc. 2023
Subjects:
Online Access:https://dx.doi.org/10.48448/awrh-mn93
https://underline.io/lecture/72066-unsupervised-part-of-speech-induction-for-language-description-modeling-documentation-materials-in-kolyma-yukaghir
id ftdatacite:10.48448/awrh-mn93
record_format openpolar
spelling ftdatacite:10.48448/awrh-mn93 2023-10-01T03:57:12+02:00 Unsupervised part-of-speech induction for language description: Modeling documentation materials in Kolyma Yukaghir ... Association for Computational Linguistics 2023 Roll, Nathan Todd, Simon Ventayol-Boada, Albert 2023 https://dx.doi.org/10.48448/awrh-mn93 https://underline.io/lecture/72066-unsupervised-part-of-speech-induction-for-language-description-modeling-documentation-materials-in-kolyma-yukaghir unknown Underline Science Inc. Natural Language Processing Language Models Machine Learning article MediaObject Conference talk Audiovisual 2023 ftdatacite https://doi.org/10.48448/awrh-mn93 2023-09-04T14:47:50Z This study investigates the clustering of words into Part-of-Speech (POS) classes in Kolyma Yukaghir. In grammatical descriptions, lexical items are assigned to POS classes based on their morphological paradigms. Discursively, however, these classes share a fair amount of morphology. In this study, we turn to POS induction to evaluate if classes based on quantification of the distributions in which roots and affixes are used can be useful for language description purposes, and, if so, what those classes might be. We qualitatively compare clusters of roots and affixes based on four different definitions of their distributions. The results show that clustering is more reliable for words that typically bear more morphology. Additionally, the results suggest that the number of POS classes in Kolyma Yukaghir might be smaller than stated in current descriptions. This study thus demonstrates how unsupervised learning methods can provide insights for language description, particularly for highly inflectional ... Article in Journal/Newspaper Kolyma Yukaghir Yukaghir DataCite Metadata Store (German National Library of Science and Technology) Kolyma ENVELOPE(161.000,161.000,69.500,69.500)
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language unknown
topic Natural Language Processing
Language Models
Machine Learning
spellingShingle Natural Language Processing
Language Models
Machine Learning
Association for Computational Linguistics 2023
Roll, Nathan
Todd, Simon
Ventayol-Boada, Albert
Unsupervised part-of-speech induction for language description: Modeling documentation materials in Kolyma Yukaghir ...
topic_facet Natural Language Processing
Language Models
Machine Learning
description This study investigates the clustering of words into Part-of-Speech (POS) classes in Kolyma Yukaghir. In grammatical descriptions, lexical items are assigned to POS classes based on their morphological paradigms. Discursively, however, these classes share a fair amount of morphology. In this study, we turn to POS induction to evaluate if classes based on quantification of the distributions in which roots and affixes are used can be useful for language description purposes, and, if so, what those classes might be. We qualitatively compare clusters of roots and affixes based on four different definitions of their distributions. The results show that clustering is more reliable for words that typically bear more morphology. Additionally, the results suggest that the number of POS classes in Kolyma Yukaghir might be smaller than stated in current descriptions. This study thus demonstrates how unsupervised learning methods can provide insights for language description, particularly for highly inflectional ...
format Article in Journal/Newspaper
author Association for Computational Linguistics 2023
Roll, Nathan
Todd, Simon
Ventayol-Boada, Albert
author_facet Association for Computational Linguistics 2023
Roll, Nathan
Todd, Simon
Ventayol-Boada, Albert
author_sort Association for Computational Linguistics 2023
title Unsupervised part-of-speech induction for language description: Modeling documentation materials in Kolyma Yukaghir ...
title_short Unsupervised part-of-speech induction for language description: Modeling documentation materials in Kolyma Yukaghir ...
title_full Unsupervised part-of-speech induction for language description: Modeling documentation materials in Kolyma Yukaghir ...
title_fullStr Unsupervised part-of-speech induction for language description: Modeling documentation materials in Kolyma Yukaghir ...
title_full_unstemmed Unsupervised part-of-speech induction for language description: Modeling documentation materials in Kolyma Yukaghir ...
title_sort unsupervised part-of-speech induction for language description: modeling documentation materials in kolyma yukaghir ...
publisher Underline Science Inc.
publishDate 2023
url https://dx.doi.org/10.48448/awrh-mn93
https://underline.io/lecture/72066-unsupervised-part-of-speech-induction-for-language-description-modeling-documentation-materials-in-kolyma-yukaghir
long_lat ENVELOPE(161.000,161.000,69.500,69.500)
geographic Kolyma
geographic_facet Kolyma
genre Kolyma Yukaghir
Yukaghir
genre_facet Kolyma Yukaghir
Yukaghir
op_doi https://doi.org/10.48448/awrh-mn93
_version_ 1778528285288824832