Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus

International audience While discourse segmentation and parsing has made considerable progress in recent years, discursive analysis of conversational speech remains a difficult issue. In this paper, we exploit a French data set that has been manually segmented into discourse units to compare two app...

Full description

Bibliographic Details
Main Authors:	Prevot, Laurent, Hunter, Julie, Muller, Philippe
Other Authors:	Institut universitaire de France (IUF), Ministère de l'Education nationale, de l’Enseignement supérieur et de la Recherche (M.E.N.E.S.R.), Laboratoire Parole et Langage (LPL), Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS), Linagora Labs Toulouse, Linagora Puteaux, MEthodes et ingénierie des Langues, des Ontologies et du DIscours (IRIT-MELODI), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT), Université Toulouse III - Paul Sabatier (UT3), Tanel Alumäe, Mark Fishel, ANR-20-CE23-0017,SUMM-RE,Supervision Distante pour les Compte-Rendus Enrichis de Relations Rhétoriques(2020), ANR-16-CONV-0002,ILCB,ILCB: Institute of Language Communication and the Brain(2016)
Format:	Conference Object
Language:	English
Published:	HAL CCSD 2023
Subjects:	discourse segmentation low-resource learning conversational speech [SCCO.LING]Cognitive science/Linguistics [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing [INFO.INFO-FL]Computer Science [cs]/Formal Languages and Automata Theory [cs.FL] Faroe Islands Tórshavn
Online Access:	https://hal.science/hal-04222122 https://hal.science/hal-04222122/document https://hal.science/hal-04222122/file/Nodalida_segmentation.pdf

id	ftutoulouse3hal:oai:HAL:hal-04222122v1
record_format	openpolar
spelling	ftutoulouse3hal:oai:HAL:hal-04222122v1 2024-05-12T08:03:26+00:00 Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus Prevot, Laurent Hunter, Julie Muller, Philippe Institut universitaire de France (IUF) Ministère de l'Education nationale, de l’Enseignement supérieur et de la Recherche (M.E.N.E.S.R.) Laboratoire Parole et Langage (LPL) Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS) Linagora Labs Toulouse Linagora Puteaux MEthodes et ingénierie des Langues, des Ontologies et du DIscours (IRIT-MELODI) Institut de recherche en informatique de Toulouse (IRIT) Université Toulouse Capitole (UT Capitole) Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J) Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3) Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP) Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI) Université Toulouse - Jean Jaurès (UT2J) Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3) Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole) Université de Toulouse (UT) Université Toulouse III - Paul Sabatier (UT3) Tanel Alumäe Mark Fishel ANR-20-CE23-0017,SUMM-RE,Supervision Distante pour les Compte-Rendus Enrichis de Relations Rhétoriques(2020) ANR-16-CONV-0002,ILCB,ILCB: Institute of Language Communication and the Brain(2016) Tórshavn, Faroe Islands, Finland 2023-05-22 https://hal.science/hal-04222122 https://hal.science/hal-04222122/document https://hal.science/hal-04222122/file/Nodalida_segmentation.pdf en eng HAL CCSD ACL: Association for Computational Linguistics University of Tartu Library hal-04222122 https://hal.science/hal-04222122 https://hal.science/hal-04222122/document https://hal.science/hal-04222122/file/Nodalida_segmentation.pdf info:eu-repo/semantics/OpenAccess Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa) 24th Nordic Conference on Computational Linguistics (NoDaLiDa 2023) https://hal.science/hal-04222122 24th Nordic Conference on Computational Linguistics (NoDaLiDa 2023), May 2023, Tórshavn, Faroe Islands, Finland https://aclanthology.org/2023.nodalida-1.44/ discourse segmentation low-resource learning conversational speech [SCCO.LING]Cognitive science/Linguistics [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing [INFO.INFO-FL]Computer Science [cs]/Formal Languages and Automata Theory [cs.FL] info:eu-repo/semantics/conferenceObject Conference papers 2023 ftutoulouse3hal 2024-04-18T00:40:56Z International audience While discourse segmentation and parsing has made considerable progress in recent years, discursive analysis of conversational speech remains a difficult issue. In this paper, we exploit a French data set that has been manually segmented into discourse units to compare two approaches to discourse segmentation: fine-tuning existing systems on manual segmentation vs. using hand-crafted labeling rules to develop a weakly supervised segmenter. Our results show that both approaches yield similar performance in terms of f-score while data programming requires less manual annotation work. In a second experiment we play with the amount of training data used for fine-tuning systems and show that a small amount of hand labeled data is enough to obtain good results (albeit not as good as when all available annotated data are used). Conference Object Faroe Islands Université Toulouse III - Paul Sabatier: HAL-UPS Faroe Islands Tórshavn ENVELOPE(-6.772,-6.772,62.010,62.010)
institution	Open Polar
collection	Université Toulouse III - Paul Sabatier: HAL-UPS
op_collection_id	ftutoulouse3hal
language	English
topic	discourse segmentation low-resource learning conversational speech [SCCO.LING]Cognitive science/Linguistics [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing [INFO.INFO-FL]Computer Science [cs]/Formal Languages and Automata Theory [cs.FL]
spellingShingle	discourse segmentation low-resource learning conversational speech [SCCO.LING]Cognitive science/Linguistics [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing [INFO.INFO-FL]Computer Science [cs]/Formal Languages and Automata Theory [cs.FL] Prevot, Laurent Hunter, Julie Muller, Philippe Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus
topic_facet	discourse segmentation low-resource learning conversational speech [SCCO.LING]Cognitive science/Linguistics [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing [INFO.INFO-FL]Computer Science [cs]/Formal Languages and Automata Theory [cs.FL]
description	International audience While discourse segmentation and parsing has made considerable progress in recent years, discursive analysis of conversational speech remains a difficult issue. In this paper, we exploit a French data set that has been manually segmented into discourse units to compare two approaches to discourse segmentation: fine-tuning existing systems on manual segmentation vs. using hand-crafted labeling rules to develop a weakly supervised segmenter. Our results show that both approaches yield similar performance in terms of f-score while data programming requires less manual annotation work. In a second experiment we play with the amount of training data used for fine-tuning systems and show that a small amount of hand labeled data is enough to obtain good results (albeit not as good as when all available annotated data are used).
author2	Institut universitaire de France (IUF) Ministère de l'Education nationale, de l’Enseignement supérieur et de la Recherche (M.E.N.E.S.R.) Laboratoire Parole et Langage (LPL) Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS) Linagora Labs Toulouse Linagora Puteaux MEthodes et ingénierie des Langues, des Ontologies et du DIscours (IRIT-MELODI) Institut de recherche en informatique de Toulouse (IRIT) Université Toulouse Capitole (UT Capitole) Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J) Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3) Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP) Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI) Université Toulouse - Jean Jaurès (UT2J) Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3) Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole) Université de Toulouse (UT) Université Toulouse III - Paul Sabatier (UT3) Tanel Alumäe Mark Fishel ANR-20-CE23-0017,SUMM-RE,Supervision Distante pour les Compte-Rendus Enrichis de Relations Rhétoriques(2020) ANR-16-CONV-0002,ILCB,ILCB: Institute of Language Communication and the Brain(2016)
format	Conference Object
author	Prevot, Laurent Hunter, Julie Muller, Philippe
author_facet	Prevot, Laurent Hunter, Julie Muller, Philippe
author_sort	Prevot, Laurent
title	Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus
title_short	Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus
title_full	Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus
title_fullStr	Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus
title_full_unstemmed	Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus
title_sort	comparing methods for segmenting elementary discourse units in a french conversational corpus
publisher	HAL CCSD
publishDate	2023
url	https://hal.science/hal-04222122 https://hal.science/hal-04222122/document https://hal.science/hal-04222122/file/Nodalida_segmentation.pdf
op_coverage	Tórshavn, Faroe Islands, Finland
long_lat	ENVELOPE(-6.772,-6.772,62.010,62.010)
geographic	Faroe Islands Tórshavn
geographic_facet	Faroe Islands Tórshavn
genre	Faroe Islands
genre_facet	Faroe Islands
op_source	Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa) 24th Nordic Conference on Computational Linguistics (NoDaLiDa 2023) https://hal.science/hal-04222122 24th Nordic Conference on Computational Linguistics (NoDaLiDa 2023), May 2023, Tórshavn, Faroe Islands, Finland https://aclanthology.org/2023.nodalida-1.44/
op_relation	hal-04222122 https://hal.science/hal-04222122 https://hal.science/hal-04222122/document https://hal.science/hal-04222122/file/Nodalida_segmentation.pdf
op_rights	info:eu-repo/semantics/OpenAccess
_version_	1798845548797624320

Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus

Similar Items