When grammar can't be trusted - Valency and semantic categories in North Sámi syntactic analysis and error detection
In this dissertation, I investigate valencies and syntactically relevant semantic categories in North Sámi. In addition, I develop three machine-readable grammars for the North Sámi grammar checker GoDivvun that have access to valencies and semantics. Like a human, a machine-readable grammar analyze...
Main Author: | |
---|---|
Format: | Doctoral or Postdoctoral Thesis |
Language: | English |
Published: |
UiT Norges arktiske universitet
2018
|
Subjects: | |
Online Access: | https://hdl.handle.net/10037/12726 |
_version_ | 1829312912153378816 |
---|---|
author | Wiechetek, Linda |
author_facet | Wiechetek, Linda |
author_sort | Wiechetek, Linda |
collection | University of Tromsø: Munin Open Research Archive |
description | In this dissertation, I investigate valencies and syntactically relevant semantic categories in North Sámi. In addition, I develop three machine-readable grammars for the North Sámi grammar checker GoDivvun that have access to valencies and semantics. Like a human, a machine-readable grammar analyzes a sentence by putting together information from different linguistic levels and based on this, selects or discards certain interpretations. Grammatical errors and the extensive homonoymy of well-formed input complicate a reliable sentence analysis based on morphology and syntax alone. I therefore add valency tags to 500 North Sámi verbs and annotate semantic prototype categories to 71% of the noun lexicon. This adds a semantic layer to the sentence analysis that is used to identify governor- argument structures in the process of error detection. I evaluate the detection of a test set of local and global errors resulting in a precision above 98% for local errors and a precision above 77% for global errors. While semantic prototype tagging is the backbone of local error detection, valency annotation is the back- bone of global error detection. My approach shows that a deep syntactic and semantic sentence analysis is beneficial for local error detection and necessary for reliable global error detection. Dán doavttirgrádačállosis mun dutkkan valeanssaid ja semánttalaš kategoriijaid syntávssalaš váikkuhusaid. Lean ovttas huksen Divvun-joavkkuin davvisámi grammatihkkadivvunprográmma, man gohčodan GoDivvun. Dasa lean ráhkadan golbma mášenlogahahtti grammatihka, main valeanssat ja semantihkka leat olámuttus. Mášenlogahahtti grammatihkka doaibmá dego olmmoš: dat bidjá oktii dieđuid iešguđet giellaoahpalaš dásis ja dan vuođul analysere cealkaga ja vállje muhtun dulkomiid ovdal earáid. Homonymiija ja giellaoahpalaš meattáhusat dahket luohttevaš cealkkaanalysa váttisin, jus dušše atná morfologiija ja syntávssa vuođđun. Čoavddusin lean lasihan valeansagilkoriid 500 davvisámi verbii ja semánttalaš prototyhpaid substantiivvaide. ... |
format | Doctoral or Postdoctoral Thesis |
genre | North Sámi sami Sámi samisk |
genre_facet | North Sámi sami Sámi samisk |
geographic | Dego |
geographic_facet | Dego |
id | ftunivtroemsoe:oai:munin.uit.no:10037/12726 |
institution | Open Polar |
language | English |
long_lat | ENVELOPE(28.350,28.350,70.827,70.827) |
op_collection_id | ftunivtroemsoe |
op_relation | https://hdl.handle.net/10037/12726 |
op_rights | Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) openAccess Copyright 2018 The Author(s) https://creativecommons.org/licenses/by-nc-sa/3.0 |
publishDate | 2018 |
publisher | UiT Norges arktiske universitet |
record_format | openpolar |
spelling | ftunivtroemsoe:oai:munin.uit.no:10037/12726 2025-04-13T14:24:16+00:00 When grammar can't be trusted - Valency and semantic categories in North Sámi syntactic analysis and error detection Wiechetek, Linda 2018-05-23 https://hdl.handle.net/10037/12726 eng eng UiT Norges arktiske universitet UiT The Arctic University of Norway https://hdl.handle.net/10037/12726 Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) openAccess Copyright 2018 The Author(s) https://creativecommons.org/licenses/by-nc-sa/3.0 VDP::Humanities: 000::Linguistics: 010 VDP::Humaniora: 000::Språkvitenskapelige fag: 010 VDP::Humanities: 000::Linguistics: 010::Sami language: 031 VDP::Humaniora: 000::Språkvitenskapelige fag: 010::Samisk språk: 031 VDP::Humanities: 000::Linguistics: 010::Applied linguistics: 012 VDP::Humaniora: 000::Språkvitenskapelige fag: 010::Anvendt språkvitenskap: 012 VDP::Technology: 500::Information and communication technology: 550::Computer technology: 551 VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550::Datateknologi: 551 Doctoral thesis Doktorgradsavhandling 2018 ftunivtroemsoe 2025-03-14T05:17:55Z In this dissertation, I investigate valencies and syntactically relevant semantic categories in North Sámi. In addition, I develop three machine-readable grammars for the North Sámi grammar checker GoDivvun that have access to valencies and semantics. Like a human, a machine-readable grammar analyzes a sentence by putting together information from different linguistic levels and based on this, selects or discards certain interpretations. Grammatical errors and the extensive homonoymy of well-formed input complicate a reliable sentence analysis based on morphology and syntax alone. I therefore add valency tags to 500 North Sámi verbs and annotate semantic prototype categories to 71% of the noun lexicon. This adds a semantic layer to the sentence analysis that is used to identify governor- argument structures in the process of error detection. I evaluate the detection of a test set of local and global errors resulting in a precision above 98% for local errors and a precision above 77% for global errors. While semantic prototype tagging is the backbone of local error detection, valency annotation is the back- bone of global error detection. My approach shows that a deep syntactic and semantic sentence analysis is beneficial for local error detection and necessary for reliable global error detection. Dán doavttirgrádačállosis mun dutkkan valeanssaid ja semánttalaš kategoriijaid syntávssalaš váikkuhusaid. Lean ovttas huksen Divvun-joavkkuin davvisámi grammatihkkadivvunprográmma, man gohčodan GoDivvun. Dasa lean ráhkadan golbma mášenlogahahtti grammatihka, main valeanssat ja semantihkka leat olámuttus. Mášenlogahahtti grammatihkka doaibmá dego olmmoš: dat bidjá oktii dieđuid iešguđet giellaoahpalaš dásis ja dan vuođul analysere cealkaga ja vállje muhtun dulkomiid ovdal earáid. Homonymiija ja giellaoahpalaš meattáhusat dahket luohttevaš cealkkaanalysa váttisin, jus dušše atná morfologiija ja syntávssa vuođđun. Čoavddusin lean lasihan valeansagilkoriid 500 davvisámi verbii ja semánttalaš prototyhpaid substantiivvaide. ... Doctoral or Postdoctoral Thesis North Sámi sami Sámi samisk University of Tromsø: Munin Open Research Archive Dego ENVELOPE(28.350,28.350,70.827,70.827) |
spellingShingle | VDP::Humanities: 000::Linguistics: 010 VDP::Humaniora: 000::Språkvitenskapelige fag: 010 VDP::Humanities: 000::Linguistics: 010::Sami language: 031 VDP::Humaniora: 000::Språkvitenskapelige fag: 010::Samisk språk: 031 VDP::Humanities: 000::Linguistics: 010::Applied linguistics: 012 VDP::Humaniora: 000::Språkvitenskapelige fag: 010::Anvendt språkvitenskap: 012 VDP::Technology: 500::Information and communication technology: 550::Computer technology: 551 VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550::Datateknologi: 551 Wiechetek, Linda When grammar can't be trusted - Valency and semantic categories in North Sámi syntactic analysis and error detection |
title | When grammar can't be trusted - Valency and semantic categories in North Sámi syntactic analysis and error detection |
title_full | When grammar can't be trusted - Valency and semantic categories in North Sámi syntactic analysis and error detection |
title_fullStr | When grammar can't be trusted - Valency and semantic categories in North Sámi syntactic analysis and error detection |
title_full_unstemmed | When grammar can't be trusted - Valency and semantic categories in North Sámi syntactic analysis and error detection |
title_short | When grammar can't be trusted - Valency and semantic categories in North Sámi syntactic analysis and error detection |
title_sort | when grammar can't be trusted - valency and semantic categories in north sámi syntactic analysis and error detection |
topic | VDP::Humanities: 000::Linguistics: 010 VDP::Humaniora: 000::Språkvitenskapelige fag: 010 VDP::Humanities: 000::Linguistics: 010::Sami language: 031 VDP::Humaniora: 000::Språkvitenskapelige fag: 010::Samisk språk: 031 VDP::Humanities: 000::Linguistics: 010::Applied linguistics: 012 VDP::Humaniora: 000::Språkvitenskapelige fag: 010::Anvendt språkvitenskap: 012 VDP::Technology: 500::Information and communication technology: 550::Computer technology: 551 VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550::Datateknologi: 551 |
topic_facet | VDP::Humanities: 000::Linguistics: 010 VDP::Humaniora: 000::Språkvitenskapelige fag: 010 VDP::Humanities: 000::Linguistics: 010::Sami language: 031 VDP::Humaniora: 000::Språkvitenskapelige fag: 010::Samisk språk: 031 VDP::Humanities: 000::Linguistics: 010::Applied linguistics: 012 VDP::Humaniora: 000::Språkvitenskapelige fag: 010::Anvendt språkvitenskap: 012 VDP::Technology: 500::Information and communication technology: 550::Computer technology: 551 VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550::Datateknologi: 551 |
url | https://hdl.handle.net/10037/12726 |