Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus

International audience While discourse segmentation and parsing has made considerable progress in recent years, discursive analysis of conversational speech remains a difficult issue. In this paper, we exploit a French data set that has been manually segmented into discourse units to compare two app...

Full description

Bibliographic Details
Main Authors: Prevot, Laurent, Hunter, Julie, Muller, Philippe
Other Authors: Institut universitaire de France (IUF), Ministère de l'Education nationale, de l’Enseignement supérieur et de la Recherche (M.E.N.E.S.R.), Laboratoire Parole et Langage (LPL), Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS), Linagora Labs Toulouse, Linagora Puteaux, MEthodes et ingénierie des Langues, des Ontologies et du DIscours (IRIT-MELODI), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT), Université Toulouse III - Paul Sabatier (UT3), Tanel Alumäe, Mark Fishel, ANR-20-CE23-0017,SUMM-RE,Supervision Distante pour les Compte-Rendus Enrichis de Relations Rhétoriques(2020), ANR-16-CONV-0002,ILCB,ILCB: Institute of Language Communication and the Brain(2016)
Format: Conference Object
Language:English
Published: HAL CCSD 2023
Subjects:
Online Access:https://hal.science/hal-04222122
https://hal.science/hal-04222122/document
https://hal.science/hal-04222122/file/Nodalida_segmentation.pdf
Description
Summary:International audience While discourse segmentation and parsing has made considerable progress in recent years, discursive analysis of conversational speech remains a difficult issue. In this paper, we exploit a French data set that has been manually segmented into discourse units to compare two approaches to discourse segmentation: fine-tuning existing systems on manual segmentation vs. using hand-crafted labeling rules to develop a weakly supervised segmenter. Our results show that both approaches yield similar performance in terms of f-score while data programming requires less manual annotation work. In a second experiment we play with the amount of training data used for fine-tuning systems and show that a small amount of hand labeled data is enough to obtain good results (albeit not as good as when all available annotated data are used).