The Segmentation Problem in Morphology Learning

this paper, I briefly discuss some experiments on learning morphological forms in languages with much richer morphological paradigms. Such langnages are common throughout much of the globe (from Latin and Greek to Inuit and Cashinahua or Anmajere and Kayardild - to finish with some Australian exampl...

Full description

Bibliographic Details
Main Author: Christopher D. Manning
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Published: 1998
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.5054
http://acl.ldc.upenn.edu/W/W98/W98-1240.pdf
Description
Summary:this paper, I briefly discuss some experiments on learning morphological forms in languages with much richer morphological paradigms. Such langnages are common throughout much of the globe (from Latin and Greek to Inuit and Cashinahua or Anmajere and Kayardild - to finish with some Australian examples). Attempting to learn morphology in languages with rich morphology raises quite different problems from those discussed in the work above, issues discussed - if rather naively and unsatisfactorily from a computational viewpoint - in earlier work such as Pinker (1984), MacWhinney (1978) and Peters (1983). Foremost among these is the segmentation problem of how one cuts the complex morphological forms into bits with meanings identified. Note that I assume here that the child has already figured out the meanings of words. This is a big assumption, but it is reasonable for a model to focus on one aspect of the learning problem - and at any rate the learn- ing task is still much broader and more realistic than that attempted by the recent English past tense literature. It may not even be unrealistic; see Pinker (1984:29-30) for a general defense of assuming some form of "semantic bootstrapping" and MacWhinney (1978:70-71) who for arguments for the learning of word meanings before gaining a productive understanding of them ("it appears that the use of inflections in amalgams is stabilized semantically before these amalgams are analyzed morphologically"). Thus the learning task which I am attempting to address could be stated thus: Given a set of words and a representation of their meanings, determine an internalized representation that will allow heard and (regular) unheard forms to be successfully pre- dicted and parsed