Sámegielaid modelleren – huksen ja heiveheapmi duohta giellamáilbmái
Nákkosgirjjis guorahalan leago grammatihkalaš giellamodealla vástádus sámegielaid ja eará davvimáilmmi eamiálbmotgielaid giellateknologalaš dárbbuide? Grámmatihkalaš giellamodealla lea huksejuvvon Finite state transduserin (FST:n). Suokkardalan hástalusaid mat čuožžilit go galgá hukset ja heivehit s...
Main Author: | |
---|---|
Format: | Doctoral or Postdoctoral Thesis |
Language: | Sami languages |
Published: |
UiT Norges arktiske universitet
2018
|
Subjects: | |
Online Access: | https://hdl.handle.net/10037/12884 |
id |
ftunivtroemsoe:oai:munin.uit.no:10037/12884 |
---|---|
record_format |
openpolar |
institution |
Open Polar |
collection |
University of Tromsø: Munin Open Research Archive |
op_collection_id |
ftunivtroemsoe |
language |
Sami languages |
topic |
VDP::Humaniora: 000::Språkvitenskapelige fag: 010 VDP::Humanities: 000::Linguistics: 010 DOKTOR-001 |
spellingShingle |
VDP::Humaniora: 000::Språkvitenskapelige fag: 010 VDP::Humanities: 000::Linguistics: 010 DOKTOR-001 Antonsen, Lene Sámegielaid modelleren – huksen ja heiveheapmi duohta giellamáilbmái |
topic_facet |
VDP::Humaniora: 000::Språkvitenskapelige fag: 010 VDP::Humanities: 000::Linguistics: 010 DOKTOR-001 |
description |
Nákkosgirjjis guorahalan leago grammatihkalaš giellamodealla vástádus sámegielaid ja eará davvimáilmmi eamiálbmotgielaid giellateknologalaš dárbbuide? Grámmatihkalaš giellamodealla lea huksejuvvon Finite state transduserin (FST:n). Suokkardalan hástalusaid mat čuožžilit go galgá hukset ja heivehit sámegielaid grammatihkalaš giellamodeallaid duohta giellamáilbmái. FST:in sáhttá modelleret morfofonologalaš rievddademiid ja analyseret teakstakorpusiid vaikko teavsttain lea ollu lingvisttalaš variašuvdna. FST:in sáhttá maiddái genereret sátnehámiid mat eai gávdno korpusis, muhto gávdnojit gielas. Syntávssalaš analysáhtor mii lea huksejuvvon Constraint Grammariin, ii gáibit stuorra teakstačoakkáldaga, ja lea gierdilis vuohki disambigueret morfologalaš analysaid gaskkas. Čájehan mo sáhttit ávkkástallat sámegielaid grámmatihkalaš giellamodeallain geavaheddjiidprográmmain. Grammatihkalaš gilkorat dahket vejolažžan addit metalingvisttalaš dieđuid ja máhcahaga geavaheddjiide. FST ferte heivehit nu ahte dat buorebut speadjalastá duohta giela. FST lahkonanvuogi čuolbma lea badjelmearálaš sátnehámiid genereren, ja dan dihte lea dehálaš gáržžidit giellamodealla. Go ráhkada giellateknologalaš reaidduid giellaoahpahallamii, de lea ávkkálaš dasa siskkildit giellaoahpahalli gaskagiela. Suokkardalan muhtun vugiid mo sáhttá gáržžidit ja viiddidit giellamodealla vai buorebut sulastahttá duohta giellamáilmmi. Čájehan ahte FST huksen vástida máŋgga davvimáilmmi eamiálbmotgiela dárbbuide oažžut giellateknologalaš reaidduid. FST lahkonanvuohki oktan buriin vuođđostruktuvrrain mainna ávkkástallá vuođđobarggus mii lea dahkkon sámegielaide, dahká alladásat geavaheddjiidprográmmaid olámuddui vehádatgielaide main lea rikkes morfologiija, vátna teakstakorpus ja unnán hállit. ENGLISH : In this thesis I investigate whether grammatical language modeling is an appropriate response to the need for language technology for the Saami languages and other circumpolar indigenous languages. The grammatical language models under investigation are built as Finite state transducers (FST). I examine the challenges of building such grammatical models for Saami languages and adapting them to real-world linguistic issues. A finite state transducer (FST) makes it possible to model morphophonological alternations and analyse text in a corpus, even when there is considerable linguistic variation in the text. One can also generate word forms that are not found in the corpus, but exist in the language. A syntactic analyser based on Constraint Grammar does not require a large text corpus, and does robust disambiguation of multiple morphological analyses. I show how Saami grammatical language models can be implemented in various user programs. Grammatical tags make it possible to provide both metalinguistic information and immediate feedback to the user's input. It is necessary to adapt the FSTs to real language usage. The FST approach causes overgeneration, which is why it is important to limit the language model. Including the learner's interlanguage is also useful for language learning tools based on language technology. I have examined a number of ways both to limit and to expand the language models. I show that the construction of an FST is the key answer to the need for language technology tools for circumpolar indigenous languages. With the appropriate infrastructure available, which also makes it possible to port results achieved for the Saami language to other languages as well, the FST approach places advanced user applications with the reach of minority languages with complex morphology, meagre corpus resources, and few speakers. |
format |
Doctoral or Postdoctoral Thesis |
author |
Antonsen, Lene |
author_facet |
Antonsen, Lene |
author_sort |
Antonsen, Lene |
title |
Sámegielaid modelleren – huksen ja heiveheapmi duohta giellamáilbmái |
title_short |
Sámegielaid modelleren – huksen ja heiveheapmi duohta giellamáilbmái |
title_full |
Sámegielaid modelleren – huksen ja heiveheapmi duohta giellamáilbmái |
title_fullStr |
Sámegielaid modelleren – huksen ja heiveheapmi duohta giellamáilbmái |
title_full_unstemmed |
Sámegielaid modelleren – huksen ja heiveheapmi duohta giellamáilbmái |
title_sort |
sámegielaid modelleren – huksen ja heiveheapmi duohta giellamáilbmái |
publisher |
UiT Norges arktiske universitet |
publishDate |
2018 |
url |
https://hdl.handle.net/10037/12884 |
long_lat |
ENVELOPE(28.673,28.673,70.314,70.314) ENVELOPE(25.866,25.866,70.561,70.561) |
geographic |
Čuolbma Gielas |
geographic_facet |
Čuolbma Gielas |
genre |
saami |
genre_facet |
saami |
op_relation |
https://hdl.handle.net/10037/12884 |
op_rights |
openAccess Copyright 2018 The Author(s) |
_version_ |
1766180510664491008 |
spelling |
ftunivtroemsoe:oai:munin.uit.no:10037/12884 2023-05-15T18:08:14+02:00 Sámegielaid modelleren – huksen ja heiveheapmi duohta giellamáilbmái Antonsen, Lene 2018-06-26 https://hdl.handle.net/10037/12884 smi smi UiT Norges arktiske universitet UiT The Arctic University of Norway https://hdl.handle.net/10037/12884 openAccess Copyright 2018 The Author(s) VDP::Humaniora: 000::Språkvitenskapelige fag: 010 VDP::Humanities: 000::Linguistics: 010 DOKTOR-001 Doctoral thesis Doktorgradsavhandling 2018 ftunivtroemsoe 2021-06-25T17:55:56Z Nákkosgirjjis guorahalan leago grammatihkalaš giellamodealla vástádus sámegielaid ja eará davvimáilmmi eamiálbmotgielaid giellateknologalaš dárbbuide? Grámmatihkalaš giellamodealla lea huksejuvvon Finite state transduserin (FST:n). Suokkardalan hástalusaid mat čuožžilit go galgá hukset ja heivehit sámegielaid grammatihkalaš giellamodeallaid duohta giellamáilbmái. FST:in sáhttá modelleret morfofonologalaš rievddademiid ja analyseret teakstakorpusiid vaikko teavsttain lea ollu lingvisttalaš variašuvdna. FST:in sáhttá maiddái genereret sátnehámiid mat eai gávdno korpusis, muhto gávdnojit gielas. Syntávssalaš analysáhtor mii lea huksejuvvon Constraint Grammariin, ii gáibit stuorra teakstačoakkáldaga, ja lea gierdilis vuohki disambigueret morfologalaš analysaid gaskkas. Čájehan mo sáhttit ávkkástallat sámegielaid grámmatihkalaš giellamodeallain geavaheddjiidprográmmain. Grammatihkalaš gilkorat dahket vejolažžan addit metalingvisttalaš dieđuid ja máhcahaga geavaheddjiide. FST ferte heivehit nu ahte dat buorebut speadjalastá duohta giela. FST lahkonanvuogi čuolbma lea badjelmearálaš sátnehámiid genereren, ja dan dihte lea dehálaš gáržžidit giellamodealla. Go ráhkada giellateknologalaš reaidduid giellaoahpahallamii, de lea ávkkálaš dasa siskkildit giellaoahpahalli gaskagiela. Suokkardalan muhtun vugiid mo sáhttá gáržžidit ja viiddidit giellamodealla vai buorebut sulastahttá duohta giellamáilmmi. Čájehan ahte FST huksen vástida máŋgga davvimáilmmi eamiálbmotgiela dárbbuide oažžut giellateknologalaš reaidduid. FST lahkonanvuohki oktan buriin vuođđostruktuvrrain mainna ávkkástallá vuođđobarggus mii lea dahkkon sámegielaide, dahká alladásat geavaheddjiidprográmmaid olámuddui vehádatgielaide main lea rikkes morfologiija, vátna teakstakorpus ja unnán hállit. ENGLISH : In this thesis I investigate whether grammatical language modeling is an appropriate response to the need for language technology for the Saami languages and other circumpolar indigenous languages. The grammatical language models under investigation are built as Finite state transducers (FST). I examine the challenges of building such grammatical models for Saami languages and adapting them to real-world linguistic issues. A finite state transducer (FST) makes it possible to model morphophonological alternations and analyse text in a corpus, even when there is considerable linguistic variation in the text. One can also generate word forms that are not found in the corpus, but exist in the language. A syntactic analyser based on Constraint Grammar does not require a large text corpus, and does robust disambiguation of multiple morphological analyses. I show how Saami grammatical language models can be implemented in various user programs. Grammatical tags make it possible to provide both metalinguistic information and immediate feedback to the user's input. It is necessary to adapt the FSTs to real language usage. The FST approach causes overgeneration, which is why it is important to limit the language model. Including the learner's interlanguage is also useful for language learning tools based on language technology. I have examined a number of ways both to limit and to expand the language models. I show that the construction of an FST is the key answer to the need for language technology tools for circumpolar indigenous languages. With the appropriate infrastructure available, which also makes it possible to port results achieved for the Saami language to other languages as well, the FST approach places advanced user applications with the reach of minority languages with complex morphology, meagre corpus resources, and few speakers. Doctoral or Postdoctoral Thesis saami University of Tromsø: Munin Open Research Archive Čuolbma ENVELOPE(28.673,28.673,70.314,70.314) Gielas ENVELOPE(25.866,25.866,70.561,70.561) |