Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus

International audience While discourse segmentation and parsing has made considerable progress in recent years, discursive analysis of conversational speech remains a difficult issue. In this paper, we exploit a French data set that has been manually segmented into discourse units to compare two app...

Full description

Bibliographic Details
Main Authors: Prevot, Laurent, Hunter, Julie, Muller, Philippe
Other Authors: Institut universitaire de France (IUF), Ministère de l'Education nationale, de l’Enseignement supérieur et de la Recherche (M.E.N.E.S.R.), Laboratoire Parole et Langage (LPL), Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS), Linagora Labs Toulouse, Linagora Puteaux, MEthodes et ingénierie des Langues, des Ontologies et du DIscours (IRIT-MELODI), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT), Université Toulouse III - Paul Sabatier (UT3), Tanel Alumäe, Mark Fishel, ANR-20-CE23-0017,SUMM-RE,Supervision Distante pour les Compte-Rendus Enrichis de Relations Rhétoriques(2020), ANR-16-CONV-0002,ILCB,ILCB: Institute of Language Communication and the Brain(2016)
Format: Conference Object
Language:English
Published: HAL CCSD 2023
Subjects:
Online Access:https://hal.science/hal-04222122
https://hal.science/hal-04222122/document
https://hal.science/hal-04222122/file/Nodalida_segmentation.pdf
id ftanrparis:oai:HAL:hal-04222122v1
record_format openpolar
spelling ftanrparis:oai:HAL:hal-04222122v1 2024-05-12T08:03:26+00:00 Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus Prevot, Laurent Hunter, Julie Muller, Philippe Institut universitaire de France (IUF) Ministère de l'Education nationale, de l’Enseignement supérieur et de la Recherche (M.E.N.E.S.R.) Laboratoire Parole et Langage (LPL) Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS) Linagora Labs Toulouse Linagora Puteaux MEthodes et ingénierie des Langues, des Ontologies et du DIscours (IRIT-MELODI) Institut de recherche en informatique de Toulouse (IRIT) Université Toulouse Capitole (UT Capitole) Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J) Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3) Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP) Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI) Université Toulouse - Jean Jaurès (UT2J) Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3) Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole) Université de Toulouse (UT) Université Toulouse III - Paul Sabatier (UT3) Tanel Alumäe Mark Fishel ANR-20-CE23-0017,SUMM-RE,Supervision Distante pour les Compte-Rendus Enrichis de Relations Rhétoriques(2020) ANR-16-CONV-0002,ILCB,ILCB: Institute of Language Communication and the Brain(2016) Tórshavn, Faroe Islands, Finland 2023-05-22 https://hal.science/hal-04222122 https://hal.science/hal-04222122/document https://hal.science/hal-04222122/file/Nodalida_segmentation.pdf en eng HAL CCSD ACL: Association for Computational Linguistics University of Tartu Library hal-04222122 https://hal.science/hal-04222122 https://hal.science/hal-04222122/document https://hal.science/hal-04222122/file/Nodalida_segmentation.pdf info:eu-repo/semantics/OpenAccess Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa) 24th Nordic Conference on Computational Linguistics (NoDaLiDa 2023) https://hal.science/hal-04222122 24th Nordic Conference on Computational Linguistics (NoDaLiDa 2023), May 2023, Tórshavn, Faroe Islands, Finland https://aclanthology.org/2023.nodalida-1.44/ discourse segmentation low-resource learning conversational speech [SCCO.LING]Cognitive science/Linguistics [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing [INFO.INFO-FL]Computer Science [cs]/Formal Languages and Automata Theory [cs.FL] info:eu-repo/semantics/conferenceObject Conference papers 2023 ftanrparis 2024-04-16T02:58:58Z International audience While discourse segmentation and parsing has made considerable progress in recent years, discursive analysis of conversational speech remains a difficult issue. In this paper, we exploit a French data set that has been manually segmented into discourse units to compare two approaches to discourse segmentation: fine-tuning existing systems on manual segmentation vs. using hand-crafted labeling rules to develop a weakly supervised segmenter. Our results show that both approaches yield similar performance in terms of f-score while data programming requires less manual annotation work. In a second experiment we play with the amount of training data used for fine-tuning systems and show that a small amount of hand labeled data is enough to obtain good results (albeit not as good as when all available annotated data are used). Conference Object Faroe Islands Portail HAL-ANR (Agence Nationale de la Recherche) Faroe Islands Tórshavn ENVELOPE(-6.772,-6.772,62.010,62.010)
institution Open Polar
collection Portail HAL-ANR (Agence Nationale de la Recherche)
op_collection_id ftanrparis
language English
topic discourse segmentation
low-resource learning
conversational speech
[SCCO.LING]Cognitive science/Linguistics
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
[INFO.INFO-FL]Computer Science [cs]/Formal Languages and Automata Theory [cs.FL]
spellingShingle discourse segmentation
low-resource learning
conversational speech
[SCCO.LING]Cognitive science/Linguistics
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
[INFO.INFO-FL]Computer Science [cs]/Formal Languages and Automata Theory [cs.FL]
Prevot, Laurent
Hunter, Julie
Muller, Philippe
Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus
topic_facet discourse segmentation
low-resource learning
conversational speech
[SCCO.LING]Cognitive science/Linguistics
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
[INFO.INFO-FL]Computer Science [cs]/Formal Languages and Automata Theory [cs.FL]
description International audience While discourse segmentation and parsing has made considerable progress in recent years, discursive analysis of conversational speech remains a difficult issue. In this paper, we exploit a French data set that has been manually segmented into discourse units to compare two approaches to discourse segmentation: fine-tuning existing systems on manual segmentation vs. using hand-crafted labeling rules to develop a weakly supervised segmenter. Our results show that both approaches yield similar performance in terms of f-score while data programming requires less manual annotation work. In a second experiment we play with the amount of training data used for fine-tuning systems and show that a small amount of hand labeled data is enough to obtain good results (albeit not as good as when all available annotated data are used).
author2 Institut universitaire de France (IUF)
Ministère de l'Education nationale, de l’Enseignement supérieur et de la Recherche (M.E.N.E.S.R.)
Laboratoire Parole et Langage (LPL)
Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS)
Linagora Labs Toulouse
Linagora Puteaux
MEthodes et ingénierie des Langues, des Ontologies et du DIscours (IRIT-MELODI)
Institut de recherche en informatique de Toulouse (IRIT)
Université Toulouse Capitole (UT Capitole)
Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J)
Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3)
Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP)
Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI)
Université Toulouse - Jean Jaurès (UT2J)
Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3)
Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole)
Université de Toulouse (UT)
Université Toulouse III - Paul Sabatier (UT3)
Tanel Alumäe
Mark Fishel
ANR-20-CE23-0017,SUMM-RE,Supervision Distante pour les Compte-Rendus Enrichis de Relations Rhétoriques(2020)
ANR-16-CONV-0002,ILCB,ILCB: Institute of Language Communication and the Brain(2016)
format Conference Object
author Prevot, Laurent
Hunter, Julie
Muller, Philippe
author_facet Prevot, Laurent
Hunter, Julie
Muller, Philippe
author_sort Prevot, Laurent
title Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus
title_short Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus
title_full Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus
title_fullStr Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus
title_full_unstemmed Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus
title_sort comparing methods for segmenting elementary discourse units in a french conversational corpus
publisher HAL CCSD
publishDate 2023
url https://hal.science/hal-04222122
https://hal.science/hal-04222122/document
https://hal.science/hal-04222122/file/Nodalida_segmentation.pdf
op_coverage Tórshavn, Faroe Islands, Finland
long_lat ENVELOPE(-6.772,-6.772,62.010,62.010)
geographic Faroe Islands
Tórshavn
geographic_facet Faroe Islands
Tórshavn
genre Faroe Islands
genre_facet Faroe Islands
op_source Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
24th Nordic Conference on Computational Linguistics (NoDaLiDa 2023)
https://hal.science/hal-04222122
24th Nordic Conference on Computational Linguistics (NoDaLiDa 2023), May 2023, Tórshavn, Faroe Islands, Finland
https://aclanthology.org/2023.nodalida-1.44/
op_relation hal-04222122
https://hal.science/hal-04222122
https://hal.science/hal-04222122/document
https://hal.science/hal-04222122/file/Nodalida_segmentation.pdf
op_rights info:eu-repo/semantics/OpenAccess
_version_ 1798845550415577088