Engaging the Public with Web Archives: Providing Access to 10 Years of Political History with WebArchives.ca

The Canadian Society of Digital Humanities/Société canadienne des humanités numériques Conference 2016 Introduction The growth of digital sources since the advent of the World Wide Web in 1990-91 presents profound opportunities for historians. Large web archives contain billions of webpages, and now...

Full description

Bibliographic Details
Main Authors: Ruest, Nick, Milligan, Ian
Format: Conference Object
Language:English
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10315/31320
Description
Summary:The Canadian Society of Digital Humanities/Société canadienne des humanités numériques Conference 2016 Introduction The growth of digital sources since the advent of the World Wide Web in 1990-91 presents profound opportunities for historians. Large web archives contain billions of webpages, and now make it possible for us to develop large-scale reconstructions of the recent web. Yet the sheer number of these sources presents significant challenges. The Internet Archive's "Wayback Machine" (http://archive.org/web) is a standard entryway to these collections, but requires that the user know the URL of the resource they want to visit; it is not feasible to do large-scale research in this manner. By unlocking the Wayback Machine's underlying WebARCHive (ARC/WARC) files, we can develop methods to track, visualize, and analyze change occurring over time. In this paper, we discuss how we implemented the United Kingdom Web Archive (UKWA) "Shine" interface on a Canadian corpus, and how the provision of a user layer significantly changed levels of user engagement. Project Rationale and Case Study The University of Toronto Library (UTL) began collecting a quarterly crawl in 2005 of Canadian political parties and political interest groups. It includes fifty websites: major and minor political parties, as well as political interest groups such as the Assembly of First Nations and equal marriage advocacy groups. Collecting continues. Despite 2005-2015 having been a pivotal period for Canadian politics, and analytics reveal few took advantage of it. The current portal requires a visit to https://archive-it.org/collections/227 for full-text queries. There is no faceting or significant advanced search features. The interface is largely unusable for broad research questions. Shine To provide access, we implemented the Shine interface (https://github.com/ukwa/shine). Shine provides a web-based interface for interacting with Apache Solr. Using the open-sourced code, we indexed all of the sites, provided explanatory layers, ...