Tweets from Canadian provincial & territorial health officials

This dataset includes tweets from provincial and territorial government officials. In cases when a health official does not use Twitter (e.g. Dr. Bonnie Henry), other official accounts for the province or territory have been substituted. The dataset does not include tweets from federal government of...

Full description

Bibliographic Details
Main Authors: Paterson, Susan, Brigham, Doug
Language:English
French
Published: Borealis
Subjects:
Online Access:https://doi.org/10.5683/SP2/TOQJFJ
Description
Summary:This dataset includes tweets from provincial and territorial government officials. In cases when a health official does not use Twitter (e.g. Dr. Bonnie Henry), other official accounts for the province or territory have been substituted. The dataset does not include tweets from federal government officials. The tweet IDs were collected using Documenting the Now's Twarc library (https://github.com/DocNow/twarc). The date of the earliest available tweet is different for each handle. The date of the latest available tweet will not be later than the upload date for each file. See the file-level information below. The tweet ids were extracted from the raw JSON files retrieved from Twitter using Twarc. However, Twitter's terms of use do not permit the sharing of the raw JSON files for this dataset. The raw JSON files can be retrieved from Twitter, provided the content is still available, using the 'hydrate' command within Twarc. The researchers retained the source JSON files and may be contacted by other researchers if they wish to access them. The files of tweet ids will be updated over time and this metadata, the files and this readme.txt file will be updated accordingly. Raw JSON files were harvested using Twarc's 'timeline' command. The 'timeline' command retrieves the most recent tweets from the specified handle, to a maximum of approximately 3,300 tweets. The data for each handle was collected approximately weekly, starting in January 2021. In order not to lose earlier tweets, we concatenated the JSON for each new 'timeline' crawl to the earlier crawls and de-duplicated the combined JSON using Twarc's 'deduplicate' command. We then used Twarc's 'dehydrate' command to extract just the tweet ids from the deduplicate JSON file. Finally, we sorted the tweet ids numerically so that they would appear in ascending date order. The basic workflow looks like: twarc timeline --> concatenate JSON files --> deduplicate resulting JSON file --> dehydrate tweet ids --> sort tweet ids. The Twitter handles include: ...