Sámegielaid modelleren – huksen ja heiveheapmi duohta giellamáilbmái

Nákkosgirjjis guorahalan leago grammatihkalaš giellamodealla vástádus sámegielaid ja eará davvimáilmmi eamiálbmotgielaid giellateknologalaš dárbbuide? Grámmatihkalaš giellamodealla lea huksejuvvon Finite state transduserin (FST:n). Suokkardalan hástalusaid mat čuožžilit go galgá hukset ja heivehit s...

Full description

Bibliographic Details
Main Author: Antonsen, Lene
Format: Doctoral or Postdoctoral Thesis
Language:Sami languages
Published: UiT Norges arktiske universitet 2018
Subjects:
Online Access:https://hdl.handle.net/10037/12884
Description
Summary:Nákkosgirjjis guorahalan leago grammatihkalaš giellamodealla vástádus sámegielaid ja eará davvimáilmmi eamiálbmotgielaid giellateknologalaš dárbbuide? Grámmatihkalaš giellamodealla lea huksejuvvon Finite state transduserin (FST:n). Suokkardalan hástalusaid mat čuožžilit go galgá hukset ja heivehit sámegielaid grammatihkalaš giellamodeallaid duohta giellamáilbmái. FST:in sáhttá modelleret morfofonologalaš rievddademiid ja analyseret teakstakorpusiid vaikko teavsttain lea ollu lingvisttalaš variašuvdna. FST:in sáhttá maiddái genereret sátnehámiid mat eai gávdno korpusis, muhto gávdnojit gielas. Syntávssalaš analysáhtor mii lea huksejuvvon Constraint Grammariin, ii gáibit stuorra teakstačoakkáldaga, ja lea gierdilis vuohki disambigueret morfologalaš analysaid gaskkas. Čájehan mo sáhttit ávkkástallat sámegielaid grámmatihkalaš giellamodeallain geavaheddjiidprográmmain. Grammatihkalaš gilkorat dahket vejolažžan addit metalingvisttalaš dieđuid ja máhcahaga geavaheddjiide. FST ferte heivehit nu ahte dat buorebut speadjalastá duohta giela. FST lahkonanvuogi čuolbma lea badjelmearálaš sátnehámiid genereren, ja dan dihte lea dehálaš gáržžidit giellamodealla. Go ráhkada giellateknologalaš reaidduid giellaoahpahallamii, de lea ávkkálaš dasa siskkildit giellaoahpahalli gaskagiela. Suokkardalan muhtun vugiid mo sáhttá gáržžidit ja viiddidit giellamodealla vai buorebut sulastahttá duohta giellamáilmmi. Čájehan ahte FST huksen vástida máŋgga davvimáilmmi eamiálbmotgiela dárbbuide oažžut giellateknologalaš reaidduid. FST lahkonanvuohki oktan buriin vuođđostruktuvrrain mainna ávkkástallá vuođđobarggus mii lea dahkkon sámegielaide, dahká alladásat geavaheddjiidprográmmaid olámuddui vehádatgielaide main lea rikkes morfologiija, vátna teakstakorpus ja unnán hállit. ENGLISH : In this thesis I investigate whether grammatical language modeling is an appropriate response to the need for language technology for the Saami languages and other circumpolar indigenous languages. The grammatical language models under investigation are built as Finite state transducers (FST). I examine the challenges of building such grammatical models for Saami languages and adapting them to real-world linguistic issues. A finite state transducer (FST) makes it possible to model morphophonological alternations and analyse text in a corpus, even when there is considerable linguistic variation in the text. One can also generate word forms that are not found in the corpus, but exist in the language. A syntactic analyser based on Constraint Grammar does not require a large text corpus, and does robust disambiguation of multiple morphological analyses. I show how Saami grammatical language models can be implemented in various user programs. Grammatical tags make it possible to provide both metalinguistic information and immediate feedback to the user's input. It is necessary to adapt the FSTs to real language usage. The FST approach causes overgeneration, which is why it is important to limit the language model. Including the learner's interlanguage is also useful for language learning tools based on language technology. I have examined a number of ways both to limit and to expand the language models. I show that the construction of an FST is the key answer to the need for language technology tools for circumpolar indigenous languages. With the appropriate infrastructure available, which also makes it possible to port results achieved for the Saami language to other languages as well, the FST approach places advanced user applications with the reach of minority languages with complex morphology, meagre corpus resources, and few speakers.