Little Big Data: Karelian Twitter Corpus

This paper investigates Karelian language visibility on Twitter and describes the first corresponding data collection using languagerelated keywords and hashtags. In total, 2626 entries written fully or partially in Livvi, South and Viena Karelian were scraped with Postman API. The visibility of Kar...

Full description

Bibliographic Details
Main Authors: Moshnikov, Ilia, Rykova, Eugenia
Other Authors: Cotgrove, Louis; Herzberg, Laura; Lüngen, Harald; Pisetta, Ines, Karjalan tutkimuslaitos, Humanistinen osasto
Format: Article in Journal/Newspaper
Language:English
Published: Leibniz-Institut für Deutsche Sprache 2023
Subjects:
Online Access:https://erepo.uef.fi/handle/123456789/30594
Description
Summary:This paper investigates Karelian language visibility on Twitter and describes the first corresponding data collection using languagerelated keywords and hashtags. In total, 2626 entries written fully or partially in Livvi, South and Viena Karelian were scraped with Postman API. The visibility of Karelian on Twitter has been considerably increasing in the past few years, Livvi-Karelian being the most prominent dialect. The data were analysed linguistically (manually and with language detection software) and thematically. Although language-related topics are the most popular, there is a substantial number of entries in eight further topics. Applicability of the collected data for linguistic and sociological research, and further data collection considerations are discussed. published version peerReviewed