Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus

International audience While discourse segmentation and parsing has made considerable progress in recent years, discursive analysis of conversational speech remains a difficult issue. In this paper, we exploit a French data set that has been manually segmented into discourse units to compare two app...

Full description

Bibliographic Details
Main Authors:	Prevot, Laurent, Hunter, Julie, Muller, Philippe
Other Authors:	Institut Universitaire de France (IUF), Ministère de l'Education nationale, de l’Enseignement supérieur et de la Recherche (M.E.N.E.S.R.), Laboratoire Parole et Langage (LPL), Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS), Linagora Labs Toulouse, Linagora Puteaux, MEthodes et ingénierie des Langues, des Ontologies et du DIscours (IRIT-MELODI), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT), Université Toulouse III - Paul Sabatier (UT3), Tanel Alumäe, Mark Fishel, ANR-20-CE23-0017,SUMM-RE,Supervision Distante pour les Compte-Rendus Enrichis de Relations Rhétoriques(2020)
Format:	Conference Object
Language:	English
Published:	HAL CCSD 2023
Subjects:	discourse segmentation low-resource learning conversational speech [SCCO.LING]Cognitive science/Linguistics [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing [INFO.INFO-FL]Computer Science [cs]/Formal Languages and Automata Theory [cs.FL] Faroe Islands Tórshavn
Online Access:	https://hal.science/hal-04222122 https://hal.science/hal-04222122/document https://hal.science/hal-04222122/file/Nodalida_segmentation.pdf

Description
Summary:	International audience While discourse segmentation and parsing has made considerable progress in recent years, discursive analysis of conversational speech remains a difficult issue. In this paper, we exploit a French data set that has been manually segmented into discourse units to compare two approaches to discourse segmentation: fine-tuning existing systems on manual segmentation vs. using hand-crafted labeling rules to develop a weakly supervised segmenter. Our results show that both approaches yield similar performance in terms of f-score while data programming requires less manual annotation work. In a second experiment we play with the amount of training data used for fine-tuning systems and show that a small amount of hand labeled data is enough to obtain good results (albeit not as good as when all available annotated data are used).

Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus

Similar Items