Rhapsodie: a Prosodic-Syntactic Treebank for Spoken French

International audience The main objective of the Rhapsodie project (ANR Rhapsodie 07 Corp-030-01) was to define rich, explicit, and reproducible schemes for the annotation of prosody and syntax in different genres (± spontaneous, ± planned, face-to-face interviews vs. broadcast, etc.), in order to s...

Full description

Bibliographic Details
Main Authors: Lacheret, Anne, Kahane, Sylvain, Beliao, Julie, Dister, Anne, Gerdes, Kim, Goldman, Jean-Philippe, Obin, Nicolas, Pietrandrea, Paola, Tchobanov, Atanas
Other Authors: Modèles, Dynamiques, Corpus (MoDyCo), Université Paris Nanterre (UPN)-Centre National de la Recherche Scientifique (CNRS), Centre de recherche Valibel, Université Catholique de Louvain = Catholic University of Louvain (UCL), LPP - Laboratoire de Phonétique et Phonologie - UMR 7018 (LPP), Université Sorbonne Nouvelle - Paris 3-Centre National de la Recherche Scientifique (CNRS), Université de Genève (UNIGE), Analyse et synthèse sonores Paris, Sciences et Technologies de la Musique et du Son (STMS), Institut de Recherche et Coordination Acoustique/Musique (IRCAM)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche et Coordination Acoustique/Musique (IRCAM)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS), Université de Tours (UT)
Format: Other/Unknown Material
Language:English
Published: HAL CCSD 2014
Subjects:
Online Access:https://hal.sorbonne-universite.fr/hal-00968959/file/LREC2014_AL.pdf
https://hal.sorbonne-universite.fr/hal-00968959
Description
Summary:International audience The main objective of the Rhapsodie project (ANR Rhapsodie 07 Corp-030-01) was to define rich, explicit, and reproducible schemes for the annotation of prosody and syntax in different genres (± spontaneous, ± planned, face-to-face interviews vs. broadcast, etc.), in order to study the prosody/syntax/discourse interface in spoken French, and their roles in the segmentation of speech into discourse units (Lacheret, Kahane, & Pietrandrea forthcoming). We here describe the deliverable, a syntactic and prosodic treebank of spoken French, composed of 57 short samples of spoken French (5 minutes long on average, amounting to 3 hours of speech and 33000 words), orthographically and phonetically transcribed. The transcriptions and the annotations are all aligned on the speech signal: phonemes, syllables, words, speakers, overlaps. This resource is freely available at www.projet-rhapsodie.fr. The sound samples (wav/mp3), the acoustic analysis (original F0 curve manually corrected and automatic stylized F0, pitch format), the orthographic transcriptions (txt), the microsyntactic annotations (tabular format), the macrosyntactic annotations (txt, tabular format), the prosodic annotations (xml, textgrid, tabular format), and the metadata (xml and html) can be freely downloaded under the terms of the Creative Commons licence Attribution - Noncommercial - Share Alike 3.0 France. The metadata are encoded in the IMDI-CMFI format and can be parsed on line.