Investigating the Image of Entities in Social Media: Dataset Design and First Results

International audience The objective of this paper is to describe the design of a dataset that deals with the image (i.e., representation, web reputation) of various entities populating the Internet: politicians, celebrities, companies, brands etc. Our main contribution is to build and provide an or...

Full description

Bibliographic Details
Main Authors: Velcin, Julien, Brun, Caroline, Dormagen, Jean-Yves, Kim, Young-Min, Roux, Claude, Boyadjian, Julien, Bonnevay, Stephane, Neihouser, Marie, Sanjuan, Eric, Khouas, Leila, Peradotto, Anne, Molina, Alejandro
Other Authors: Entrepôts, Représentation et Ingénierie des Connaissances (ERIC), Université Lumière - Lyon 2 (UL2)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon, Penn Image Computing & Science Lab Philadelphia (PICSL), University of Pennsylvania, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Laboratoire d'Electrochimie et de Physico-chimie des Matériaux et des Interfaces (LEPMI ), Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Institut de Chimie - CNRS Chimie (INC-CNRS)-Université Savoie Mont Blanc (USMB Université de Savoie Université de Chambéry )-Centre National de la Recherche Scientifique (CNRS), Sciences Po Lille - Institut d'études politiques de Lille (IEP Lille), Equipe de Recherche en Ingénierie des Connaissances (ERIC), Université Lumière - Lyon 2 (UL2)
Format: Conference Object
Language:English
Published: HAL CCSD 2014
Subjects:
Online Access:https://hal.science/hal-02052420
https://hal.science/hal-02052420/document
https://hal.science/hal-02052420/file/LREC14_FINAL_VELCIN.pdf
Description
Summary:International audience The objective of this paper is to describe the design of a dataset that deals with the image (i.e., representation, web reputation) of various entities populating the Internet: politicians, celebrities, companies, brands etc. Our main contribution is to build and provide an original annotated French dataset. This dataset consists of 11 527 manually annotated tweets expressing the opinion on specific facets (e.g., ethic, communication, economic project) describing two French policitians over time. We believe that other researchers might benefit from this experience, since designing and implementing such a dataset has proven quite an interesting challenge. This design comprises different processes such as data selection, formal definition and instantiation of an image. We have set up a full open-source annotation platform. In addition to the dataset design, we present the first results that we obtained by applying clustering methods to the annotated dataset in order to extract the entity images.