The Nordic Tweet Stream : A Dynamic Real-Time Monitor Corpus of Big and Rich Language Data

This article presents the Nordic Tweet Stream (NTS), a cross-disciplinarycorpus project of computer scientists and a group of sociolinguists interestedin language variability and in the global spread of English. Our research integratestwo types of empirical data: We not only rely on traditional stru...

Full description

Bibliographic Details
Main Authors:	Laitinen, Mikko, Lundberg, Jonas, Levin, Magnus, Martins, Rafael Messias
Format:	Conference Object
Language:	English
Published:	Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM) 2018
Subjects:	Real-time language data Nordic Tweet Stream Twitter General Language Studies and Linguistics Jämförande språkvetenskap och allmän lingvistik Specific Languages Studier av enskilda språk Norway Iceland
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-78277

Description
Summary:	This article presents the Nordic Tweet Stream (NTS), a cross-disciplinarycorpus project of computer scientists and a group of sociolinguists interestedin language variability and in the global spread of English. Our research integratestwo types of empirical data: We not only rely on traditional structured corpusdata but also use unstructured data sources that are often big and rich inmetadata, such as Twitter streams. The NTS downloads tweets and associatedmetadata from Denmark, Finland, Iceland, Norway and Sweden. We first introducesome technical aspects in creating a dynamic real-time monitor corpus, andthe following case study illustrates how the corpus could be used as empiricalevidence in sociolinguistic studies focusing on the global spread of English tomultilingual settings. The results show that English is the most frequently usedlanguage, accounting for almost a third. These results can be used to assess howwidespread English use is in the Nordic region and offer a big data perspectivethat complement previous small-scale studies. The future objectives include annotatingthe material, making it available for the scholarly community, and expandingthe geographic scope of the data stream outside Nordic region. DISA

The Nordic Tweet Stream : A Dynamic Real-Time Monitor Corpus of Big and Rich Language Data

Similar Items