The Nordic Tweet Stream : A Dynamic Real-Time Monitor Corpus of Big and Rich Language Data

This article presents the Nordic Tweet Stream (NTS), a cross-disciplinarycorpus project of computer scientists and a group of sociolinguists interestedin language variability and in the global spread of English. Our research integratestwo types of empirical data: We not only rely on traditional stru...

Full description

Bibliographic Details
Main Authors: Laitinen, Mikko, Lundberg, Jonas, Levin, Magnus, Martins, Rafael Messias
Format: Conference Object
Language:English
Published: Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM) 2018
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-78277
id ftlinnaeusuniv:oai:DiVA.org:lnu-78277
record_format openpolar
spelling ftlinnaeusuniv:oai:DiVA.org:lnu-78277 2023-05-15T16:50:43+02:00 The Nordic Tweet Stream : A Dynamic Real-Time Monitor Corpus of Big and Rich Language Data Laitinen, Mikko Lundberg, Jonas Levin, Magnus Martins, Rafael Messias 2018 application/pdf http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-78277 eng eng Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM) Linnéuniversitetet, Institutionen för språk (SPR) University of Eastern Finland, Finland CEUR-WS.org CEUR Workshop Proceedings, 1613-0073 2084 DHN 2018 Digital Humanities in the Nordic Countries 3rd Conference : Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference Helsinki, Finland, March 7-9, 2018, p. 349-362 http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-78277 Scopus 2-s2.0-85045342911 info:eu-repo/semantics/openAccess Real-time language data Nordic Tweet Stream Twitter General Language Studies and Linguistics Jämförande språkvetenskap och allmän lingvistik Specific Languages Studier av enskilda språk Conference paper info:eu-repo/semantics/conferenceObject text 2018 ftlinnaeusuniv 2022-11-03T15:54:20Z This article presents the Nordic Tweet Stream (NTS), a cross-disciplinarycorpus project of computer scientists and a group of sociolinguists interestedin language variability and in the global spread of English. Our research integratestwo types of empirical data: We not only rely on traditional structured corpusdata but also use unstructured data sources that are often big and rich inmetadata, such as Twitter streams. The NTS downloads tweets and associatedmetadata from Denmark, Finland, Iceland, Norway and Sweden. We first introducesome technical aspects in creating a dynamic real-time monitor corpus, andthe following case study illustrates how the corpus could be used as empiricalevidence in sociolinguistic studies focusing on the global spread of English tomultilingual settings. The results show that English is the most frequently usedlanguage, accounting for almost a third. These results can be used to assess howwidespread English use is in the Nordic region and offer a big data perspectivethat complement previous small-scale studies. The future objectives include annotatingthe material, making it available for the scholarly community, and expandingthe geographic scope of the data stream outside Nordic region. DISA Conference Object Iceland Linnaeus University Kalmar Växjö: Publications (DiVA) Norway
institution Open Polar
collection Linnaeus University Kalmar Växjö: Publications (DiVA)
op_collection_id ftlinnaeusuniv
language English
topic Real-time language data
Nordic Tweet Stream
Twitter
General Language Studies and Linguistics
Jämförande språkvetenskap och allmän lingvistik
Specific Languages
Studier av enskilda språk
spellingShingle Real-time language data
Nordic Tweet Stream
Twitter
General Language Studies and Linguistics
Jämförande språkvetenskap och allmän lingvistik
Specific Languages
Studier av enskilda språk
Laitinen, Mikko
Lundberg, Jonas
Levin, Magnus
Martins, Rafael Messias
The Nordic Tweet Stream : A Dynamic Real-Time Monitor Corpus of Big and Rich Language Data
topic_facet Real-time language data
Nordic Tweet Stream
Twitter
General Language Studies and Linguistics
Jämförande språkvetenskap och allmän lingvistik
Specific Languages
Studier av enskilda språk
description This article presents the Nordic Tweet Stream (NTS), a cross-disciplinarycorpus project of computer scientists and a group of sociolinguists interestedin language variability and in the global spread of English. Our research integratestwo types of empirical data: We not only rely on traditional structured corpusdata but also use unstructured data sources that are often big and rich inmetadata, such as Twitter streams. The NTS downloads tweets and associatedmetadata from Denmark, Finland, Iceland, Norway and Sweden. We first introducesome technical aspects in creating a dynamic real-time monitor corpus, andthe following case study illustrates how the corpus could be used as empiricalevidence in sociolinguistic studies focusing on the global spread of English tomultilingual settings. The results show that English is the most frequently usedlanguage, accounting for almost a third. These results can be used to assess howwidespread English use is in the Nordic region and offer a big data perspectivethat complement previous small-scale studies. The future objectives include annotatingthe material, making it available for the scholarly community, and expandingthe geographic scope of the data stream outside Nordic region. DISA
format Conference Object
author Laitinen, Mikko
Lundberg, Jonas
Levin, Magnus
Martins, Rafael Messias
author_facet Laitinen, Mikko
Lundberg, Jonas
Levin, Magnus
Martins, Rafael Messias
author_sort Laitinen, Mikko
title The Nordic Tweet Stream : A Dynamic Real-Time Monitor Corpus of Big and Rich Language Data
title_short The Nordic Tweet Stream : A Dynamic Real-Time Monitor Corpus of Big and Rich Language Data
title_full The Nordic Tweet Stream : A Dynamic Real-Time Monitor Corpus of Big and Rich Language Data
title_fullStr The Nordic Tweet Stream : A Dynamic Real-Time Monitor Corpus of Big and Rich Language Data
title_full_unstemmed The Nordic Tweet Stream : A Dynamic Real-Time Monitor Corpus of Big and Rich Language Data
title_sort nordic tweet stream : a dynamic real-time monitor corpus of big and rich language data
publisher Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM)
publishDate 2018
url http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-78277
geographic Norway
geographic_facet Norway
genre Iceland
genre_facet Iceland
op_relation CEUR Workshop Proceedings, 1613-0073
2084
DHN 2018 Digital Humanities in the Nordic Countries 3rd Conference : Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference Helsinki, Finland, March 7-9, 2018, p. 349-362
http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-78277
Scopus 2-s2.0-85045342911
op_rights info:eu-repo/semantics/openAccess
_version_ 1766040839849508864