Sámegielaid modelleren – huksen ja heiveheapmi duohta giellamáilbmái

Nákkosgirjjis guorahalan leago grammatihkalaš giellamodealla vástádus sámegielaid ja eará davvimáilmmi eamiálbmotgielaid giellateknologalaš dárbbuide? Grámmatihkalaš giellamodealla lea huksejuvvon Finite state transduserin (FST:n). Suokkardalan hástalusaid mat čuožžilit go galgá hukset ja heivehit s...

Full description

Bibliographic Details
Main Author: Antonsen, Lene
Format: Doctoral or Postdoctoral Thesis
Language:Sami languages
Published: UiT Norges arktiske universitet 2018
Subjects:
Online Access:https://hdl.handle.net/10037/12884
id ftunivtroemsoe:oai:munin.uit.no:10037/12884
record_format openpolar
institution Open Polar
collection University of Tromsø: Munin Open Research Archive
op_collection_id ftunivtroemsoe
language Sami languages
topic VDP::Humaniora: 000::Språkvitenskapelige fag: 010
VDP::Humanities: 000::Linguistics: 010
DOKTOR-001
spellingShingle VDP::Humaniora: 000::Språkvitenskapelige fag: 010
VDP::Humanities: 000::Linguistics: 010
DOKTOR-001
Antonsen, Lene
Sámegielaid modelleren – huksen ja heiveheapmi duohta giellamáilbmái
topic_facet VDP::Humaniora: 000::Språkvitenskapelige fag: 010
VDP::Humanities: 000::Linguistics: 010
DOKTOR-001
description Nákkosgirjjis guorahalan leago grammatihkalaš giellamodealla vástádus sámegielaid ja eará davvimáilmmi eamiálbmotgielaid giellateknologalaš dárbbuide? Grámmatihkalaš giellamodealla lea huksejuvvon Finite state transduserin (FST:n). Suokkardalan hástalusaid mat čuožžilit go galgá hukset ja heivehit sámegielaid grammatihkalaš giellamodeallaid duohta giellamáilbmái. FST:in sáhttá modelleret morfofonologalaš rievddademiid ja analyseret teakstakorpusiid vaikko teavsttain lea ollu lingvisttalaš variašuvdna. FST:in sáhttá maiddái genereret sátnehámiid mat eai gávdno korpusis, muhto gávdnojit gielas. Syntávssalaš analysáhtor mii lea huksejuvvon Constraint Grammariin, ii gáibit stuorra teakstačoakkáldaga, ja lea gierdilis vuohki disambigueret morfologalaš analysaid gaskkas. Čájehan mo sáhttit ávkkástallat sámegielaid grámmatihkalaš giellamodeallain geavaheddjiidprográmmain. Grammatihkalaš gilkorat dahket vejolažžan addit metalingvisttalaš dieđuid ja máhcahaga geavaheddjiide. FST ferte heivehit nu ahte dat buorebut speadjalastá duohta giela. FST lahkonanvuogi čuolbma lea badjelmearálaš sátnehámiid genereren, ja dan dihte lea dehálaš gáržžidit giellamodealla. Go ráhkada giellateknologalaš reaidduid giellaoahpahallamii, de lea ávkkálaš dasa siskkildit giellaoahpahalli gaskagiela. Suokkardalan muhtun vugiid mo sáhttá gáržžidit ja viiddidit giellamodealla vai buorebut sulastahttá duohta giellamáilmmi. Čájehan ahte FST huksen vástida máŋgga davvimáilmmi eamiálbmotgiela dárbbuide oažžut giellateknologalaš reaidduid. FST lahkonanvuohki oktan buriin vuođđostruktuvrrain mainna ávkkástallá vuođđobarggus mii lea dahkkon sámegielaide, dahká alladásat geavaheddjiidprográmmaid olámuddui vehádatgielaide main lea rikkes morfologiija, vátna teakstakorpus ja unnán hállit. ENGLISH : In this thesis I investigate whether grammatical language modeling is an appropriate response to the need for language technology for the Saami languages and other circumpolar indigenous languages. The grammatical language models under investigation are built as Finite state transducers (FST). I examine the challenges of building such grammatical models for Saami languages and adapting them to real-world linguistic issues. A finite state transducer (FST) makes it possible to model morphophonological alternations and analyse text in a corpus, even when there is considerable linguistic variation in the text. One can also generate word forms that are not found in the corpus, but exist in the language. A syntactic analyser based on Constraint Grammar does not require a large text corpus, and does robust disambiguation of multiple morphological analyses. I show how Saami grammatical language models can be implemented in various user programs. Grammatical tags make it possible to provide both metalinguistic information and immediate feedback to the user's input. It is necessary to adapt the FSTs to real language usage. The FST approach causes overgeneration, which is why it is important to limit the language model. Including the learner's interlanguage is also useful for language learning tools based on language technology. I have examined a number of ways both to limit and to expand the language models. I show that the construction of an FST is the key answer to the need for language technology tools for circumpolar indigenous languages. With the appropriate infrastructure available, which also makes it possible to port results achieved for the Saami language to other languages as well, the FST approach places advanced user applications with the reach of minority languages with complex morphology, meagre corpus resources, and few speakers.
format Doctoral or Postdoctoral Thesis
author Antonsen, Lene
author_facet Antonsen, Lene
author_sort Antonsen, Lene
title Sámegielaid modelleren – huksen ja heiveheapmi duohta giellamáilbmái
title_short Sámegielaid modelleren – huksen ja heiveheapmi duohta giellamáilbmái
title_full Sámegielaid modelleren – huksen ja heiveheapmi duohta giellamáilbmái
title_fullStr Sámegielaid modelleren – huksen ja heiveheapmi duohta giellamáilbmái
title_full_unstemmed Sámegielaid modelleren – huksen ja heiveheapmi duohta giellamáilbmái
title_sort sámegielaid modelleren – huksen ja heiveheapmi duohta giellamáilbmái
publisher UiT Norges arktiske universitet
publishDate 2018
url https://hdl.handle.net/10037/12884
long_lat ENVELOPE(28.673,28.673,70.314,70.314)
ENVELOPE(25.866,25.866,70.561,70.561)
geographic Čuolbma
Gielas
geographic_facet Čuolbma
Gielas
genre saami
genre_facet saami
op_relation https://hdl.handle.net/10037/12884
op_rights openAccess
Copyright 2018 The Author(s)
_version_ 1766180510664491008
spelling ftunivtroemsoe:oai:munin.uit.no:10037/12884 2023-05-15T18:08:14+02:00 Sámegielaid modelleren – huksen ja heiveheapmi duohta giellamáilbmái Antonsen, Lene 2018-06-26 https://hdl.handle.net/10037/12884 smi smi UiT Norges arktiske universitet UiT The Arctic University of Norway https://hdl.handle.net/10037/12884 openAccess Copyright 2018 The Author(s) VDP::Humaniora: 000::Språkvitenskapelige fag: 010 VDP::Humanities: 000::Linguistics: 010 DOKTOR-001 Doctoral thesis Doktorgradsavhandling 2018 ftunivtroemsoe 2021-06-25T17:55:56Z Nákkosgirjjis guorahalan leago grammatihkalaš giellamodealla vástádus sámegielaid ja eará davvimáilmmi eamiálbmotgielaid giellateknologalaš dárbbuide? Grámmatihkalaš giellamodealla lea huksejuvvon Finite state transduserin (FST:n). Suokkardalan hástalusaid mat čuožžilit go galgá hukset ja heivehit sámegielaid grammatihkalaš giellamodeallaid duohta giellamáilbmái. FST:in sáhttá modelleret morfofonologalaš rievddademiid ja analyseret teakstakorpusiid vaikko teavsttain lea ollu lingvisttalaš variašuvdna. FST:in sáhttá maiddái genereret sátnehámiid mat eai gávdno korpusis, muhto gávdnojit gielas. Syntávssalaš analysáhtor mii lea huksejuvvon Constraint Grammariin, ii gáibit stuorra teakstačoakkáldaga, ja lea gierdilis vuohki disambigueret morfologalaš analysaid gaskkas. Čájehan mo sáhttit ávkkástallat sámegielaid grámmatihkalaš giellamodeallain geavaheddjiidprográmmain. Grammatihkalaš gilkorat dahket vejolažžan addit metalingvisttalaš dieđuid ja máhcahaga geavaheddjiide. FST ferte heivehit nu ahte dat buorebut speadjalastá duohta giela. FST lahkonanvuogi čuolbma lea badjelmearálaš sátnehámiid genereren, ja dan dihte lea dehálaš gáržžidit giellamodealla. Go ráhkada giellateknologalaš reaidduid giellaoahpahallamii, de lea ávkkálaš dasa siskkildit giellaoahpahalli gaskagiela. Suokkardalan muhtun vugiid mo sáhttá gáržžidit ja viiddidit giellamodealla vai buorebut sulastahttá duohta giellamáilmmi. Čájehan ahte FST huksen vástida máŋgga davvimáilmmi eamiálbmotgiela dárbbuide oažžut giellateknologalaš reaidduid. FST lahkonanvuohki oktan buriin vuođđostruktuvrrain mainna ávkkástallá vuođđobarggus mii lea dahkkon sámegielaide, dahká alladásat geavaheddjiidprográmmaid olámuddui vehádatgielaide main lea rikkes morfologiija, vátna teakstakorpus ja unnán hállit. ENGLISH : In this thesis I investigate whether grammatical language modeling is an appropriate response to the need for language technology for the Saami languages and other circumpolar indigenous languages. The grammatical language models under investigation are built as Finite state transducers (FST). I examine the challenges of building such grammatical models for Saami languages and adapting them to real-world linguistic issues. A finite state transducer (FST) makes it possible to model morphophonological alternations and analyse text in a corpus, even when there is considerable linguistic variation in the text. One can also generate word forms that are not found in the corpus, but exist in the language. A syntactic analyser based on Constraint Grammar does not require a large text corpus, and does robust disambiguation of multiple morphological analyses. I show how Saami grammatical language models can be implemented in various user programs. Grammatical tags make it possible to provide both metalinguistic information and immediate feedback to the user's input. It is necessary to adapt the FSTs to real language usage. The FST approach causes overgeneration, which is why it is important to limit the language model. Including the learner's interlanguage is also useful for language learning tools based on language technology. I have examined a number of ways both to limit and to expand the language models. I show that the construction of an FST is the key answer to the need for language technology tools for circumpolar indigenous languages. With the appropriate infrastructure available, which also makes it possible to port results achieved for the Saami language to other languages as well, the FST approach places advanced user applications with the reach of minority languages with complex morphology, meagre corpus resources, and few speakers. Doctoral or Postdoctoral Thesis saami University of Tromsø: Munin Open Research Archive Čuolbma ENVELOPE(28.673,28.673,70.314,70.314) Gielas ENVELOPE(25.866,25.866,70.561,70.561)