Nonparametric Model for Inupiaq Morphology Tokenization

We present how to use English translation for unsupervised word segmentation of low resource languages. The inference uses a dynamic programming algorithm for efficient blocked Gibbs sampling. We apply the model to Inupiaq morphology analysis and get better results than monolingual model as well as...

Full description

Bibliographic Details
Main Authors: Thuy Linh, N Stephan Vogel
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.368.1099
http://aclweb.org/anthology/C/C12/C12-3042.pdf
Description
Summary:We present how to use English translation for unsupervised word segmentation of low resource languages. The inference uses a dynamic programming algorithm for efficient blocked Gibbs sampling. We apply the model to Inupiaq morphology analysis and get better results than monolingual model as well as Morfessor output.