ANCOR_Centre, a Large Free Spoken French Coreference Corpus: description of the Resource and Reliability Measures

International audience This article presents ANCOR_Centre, a French coreference corpus, available under the Creative Commons Licence. With a size of around 500,000 words, the corpus is large enough to serve the needs of data-driven approaches in NLP and represents one of the largest coreference reso...

Full description

Bibliographic Details
Main Authors:	Muzerelle, Judith, Lefeuvre, Anaïs, Schang, Emmanuel, Antoine, Jean-Yves, Pelletier, Aurore, Maurel, Denis, Eshkol, Iris, Villaneau, Jeanne
Other Authors:	Laboratoire Ligérien de Linguistique (LLL), Université d'Orléans (UO)-Université de Tours, Bases de données et traitement des langues naturelles (BDTLN), Laboratoire d'Informatique Fondamentale et Appliquée de Tours (LIFAT), Centre National de la Recherche Scientifique (CNRS)-Université de Tours-Institut National des Sciences Appliquées - Centre Val de Loire (INSA CVL), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université de Tours-Institut National des Sciences Appliquées - Centre Val de Loire (INSA CVL), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA), SEarch, Analyze, Synthesize and Interact with Data Ecosystems (SEASIDE), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA), Région Centre, ELRA, Projet ANCOR
Format:	Conference Object
Language:	English
Published:	HAL CCSD 2014
Subjects:	French spoken language free annotated corpus coreference anaphora [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] Iceland
Online Access:	https://hal.archives-ouvertes.fr/hal-01075679 https://hal.archives-ouvertes.fr/hal-01075679/document https://hal.archives-ouvertes.fr/hal-01075679/file/2014_LREC_ANCOR.pdf

Description
Summary:	International audience This article presents ANCOR_Centre, a French coreference corpus, available under the Creative Commons Licence. With a size of around 500,000 words, the corpus is large enough to serve the needs of data-driven approaches in NLP and represents one of the largest coreference resources currently available. The corpus focuses exclusively on spoken language, it aims at representing a certain variety of spoken genders. ANCOR_Centre includes anaphora as well as coreference relations which involve nominal and pronominal mentions. The paper describes into details the annotation scheme and the reliability measures computed on the resource.

ANCOR_Centre, a Large Free Spoken French Coreference Corpus: description of the Resource and Reliability Measures

Similar Items