Automated analysis of Norwegian text

In this thesis we look at how we can develop automated analysis tools for Norwegian text. We look at 3 different tasks: Part-of-Speech (PoS) tagging, Named-Entity Chunking (NEC), and Named-Entity Recognition (NER). For our work on PoS tagging, we extend the work done on the OBT+Stat tagger by traini...

Full description

Bibliographic Details
Main Author:	Johansen, Bjarte
Format:	Doctoral or Postdoctoral Thesis
Language:	English
Published:	The University of Bergen 2019
Subjects:	Lofoten Ner Senja Vesterålen
Online Access:	http://hdl.handle.net/1956/20906

id	ftunivbergen:oai:bora.uib.no:1956/20906
record_format	openpolar
spelling	ftunivbergen:oai:bora.uib.no:1956/20906 2023-05-15T17:08:18+02:00 Automated analysis of Norwegian text Johansen, Bjarte 2019-06-28 application/pdf http://hdl.handle.net/1956/20906 eng eng The University of Bergen container/a5/57/5a/6c/a5575a6c-2df9-4aec-8a16-d82833d7c5c8 urn:isbn:9788230848753 urn:isbn:9788230866757 http://hdl.handle.net/1956/20906 cristin:1708253 Attribution-NonCommercial (CC BY-NC) http://creativecommons.org/licenses/by-nc/4.0/ Copyright the author. Doctoral thesis 2019 ftunivbergen 2023-03-14T17:38:49Z In this thesis we look at how we can develop automated analysis tools for Norwegian text. We look at 3 different tasks: Part-of-Speech (PoS) tagging, Named-Entity Chunking (NEC), and Named-Entity Recognition (NER). For our work on PoS tagging, we extend the work done on the OBT+Stat tagger by training a new model to allow it to also do disambiguation of Nynorsk. We work with Googles SyntaxNet and train it for PoS tagging of Bokmål and Nynorsk, showing state of the art results at the time of the research. We train a Support Vector Machine for NEC of Bokmål. The task of extracting names from text. Next, we develop a NER model using deep learning and provide a NER sequence tagger for Bokmål and Nynorsk. The Nynorsk tagger is the first NER model for Nynorsk that we are aware of. The best performing model is trained on both language forms. It shows better performance on both Bokmål and Nynorsk than the models we trained individually on the language forms. At last we show how we can use NEC and NER together with Social Network Analysis tools to investigate two case studies around the news story discussing the consequence study of drilling for oil in Lofoten, Vesterålen, and Senja. In the first case study we show that it is possible to find the thematic structures of a news story by analysing the relationship between the entities in the text. In the second case study, using topic modelling, we find the topics, and who the most important persons are for each topic. Doctoral or Postdoctoral Thesis Lofoten Vesterålen University of Bergen: Bergen Open Research Archive (BORA-UiB) Lofoten Ner ENVELOPE(6.622,6.622,62.612,62.612) Senja ENVELOPE(16.803,16.803,69.081,69.081) Vesterålen ENVELOPE(14.939,14.939,68.754,68.754)
institution	Open Polar
collection	University of Bergen: Bergen Open Research Archive (BORA-UiB)
op_collection_id	ftunivbergen
language	English
description	In this thesis we look at how we can develop automated analysis tools for Norwegian text. We look at 3 different tasks: Part-of-Speech (PoS) tagging, Named-Entity Chunking (NEC), and Named-Entity Recognition (NER). For our work on PoS tagging, we extend the work done on the OBT+Stat tagger by training a new model to allow it to also do disambiguation of Nynorsk. We work with Googles SyntaxNet and train it for PoS tagging of Bokmål and Nynorsk, showing state of the art results at the time of the research. We train a Support Vector Machine for NEC of Bokmål. The task of extracting names from text. Next, we develop a NER model using deep learning and provide a NER sequence tagger for Bokmål and Nynorsk. The Nynorsk tagger is the first NER model for Nynorsk that we are aware of. The best performing model is trained on both language forms. It shows better performance on both Bokmål and Nynorsk than the models we trained individually on the language forms. At last we show how we can use NEC and NER together with Social Network Analysis tools to investigate two case studies around the news story discussing the consequence study of drilling for oil in Lofoten, Vesterålen, and Senja. In the first case study we show that it is possible to find the thematic structures of a news story by analysing the relationship between the entities in the text. In the second case study, using topic modelling, we find the topics, and who the most important persons are for each topic.
format	Doctoral or Postdoctoral Thesis
author	Johansen, Bjarte
spellingShingle	Johansen, Bjarte Automated analysis of Norwegian text
author_facet	Johansen, Bjarte
author_sort	Johansen, Bjarte
title	Automated analysis of Norwegian text
title_short	Automated analysis of Norwegian text
title_full	Automated analysis of Norwegian text
title_fullStr	Automated analysis of Norwegian text
title_full_unstemmed	Automated analysis of Norwegian text
title_sort	automated analysis of norwegian text
publisher	The University of Bergen
publishDate	2019
url	http://hdl.handle.net/1956/20906
long_lat	ENVELOPE(6.622,6.622,62.612,62.612) ENVELOPE(16.803,16.803,69.081,69.081) ENVELOPE(14.939,14.939,68.754,68.754)
geographic	Lofoten Ner Senja Vesterålen
geographic_facet	Lofoten Ner Senja Vesterålen
genre	Lofoten Vesterålen
genre_facet	Lofoten Vesterålen
op_relation	container/a5/57/5a/6c/a5575a6c-2df9-4aec-8a16-d82833d7c5c8 urn:isbn:9788230848753 urn:isbn:9788230866757 http://hdl.handle.net/1956/20906 cristin:1708253
op_rights	Attribution-NonCommercial (CC BY-NC) http://creativecommons.org/licenses/by-nc/4.0/ Copyright the author.
_version_	1766064037267767296

Automated analysis of Norwegian text

Similar Items