ANCOR_Centre, a Large Free Spoken French Coreference Corpus: description of the Resource and Reliability Measures
International audience This article presents ANCOR_Centre, a French coreference corpus, available under the Creative Commons Licence. With a size of around 500,000 words, the corpus is large enough to serve the needs of data-driven approaches in NLP and represents one of the largest coreference reso...
Main Authors: | , , , , , , , |
---|---|
Other Authors: | , , , , , , , , , , , , , , , |
Format: | Conference Object |
Language: | English |
Published: |
HAL CCSD
2014
|
Subjects: | |
Online Access: | https://hal.archives-ouvertes.fr/hal-01075679 https://hal.archives-ouvertes.fr/hal-01075679/document https://hal.archives-ouvertes.fr/hal-01075679/file/2014_LREC_ANCOR.pdf |
Summary: | International audience This article presents ANCOR_Centre, a French coreference corpus, available under the Creative Commons Licence. With a size of around 500,000 words, the corpus is large enough to serve the needs of data-driven approaches in NLP and represents one of the largest coreference resources currently available. The corpus focuses exclusively on spoken language, it aims at representing a certain variety of spoken genders. ANCOR_Centre includes anaphora as well as coreference relations which involve nominal and pronominal mentions. The paper describes into details the annotation scheme and the reliability measures computed on the resource. |
---|