Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus
International audience While discourse segmentation and parsing has made considerable progress in recent years, discursive analysis of conversational speech remains a difficult issue. In this paper, we exploit a French data set that has been manually segmented into discourse units to compare two app...
Main Authors: | , , |
---|---|
Other Authors: | , , , , , , , , , , , , , , , , , , , , , |
Format: | Conference Object |
Language: | English |
Published: |
HAL CCSD
2023
|
Subjects: | |
Online Access: | https://hal.science/hal-04222122 https://hal.science/hal-04222122/document https://hal.science/hal-04222122/file/Nodalida_segmentation.pdf |
id |
ftutoulouse3hal:oai:HAL:hal-04222122v1 |
---|---|
record_format |
openpolar |
spelling |
ftutoulouse3hal:oai:HAL:hal-04222122v1 2024-05-12T08:03:26+00:00 Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus Prevot, Laurent Hunter, Julie Muller, Philippe Institut universitaire de France (IUF) Ministère de l'Education nationale, de l’Enseignement supérieur et de la Recherche (M.E.N.E.S.R.) Laboratoire Parole et Langage (LPL) Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS) Linagora Labs Toulouse Linagora Puteaux MEthodes et ingénierie des Langues, des Ontologies et du DIscours (IRIT-MELODI) Institut de recherche en informatique de Toulouse (IRIT) Université Toulouse Capitole (UT Capitole) Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J) Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3) Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP) Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI) Université Toulouse - Jean Jaurès (UT2J) Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3) Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole) Université de Toulouse (UT) Université Toulouse III - Paul Sabatier (UT3) Tanel Alumäe Mark Fishel ANR-20-CE23-0017,SUMM-RE,Supervision Distante pour les Compte-Rendus Enrichis de Relations Rhétoriques(2020) ANR-16-CONV-0002,ILCB,ILCB: Institute of Language Communication and the Brain(2016) Tórshavn, Faroe Islands, Finland 2023-05-22 https://hal.science/hal-04222122 https://hal.science/hal-04222122/document https://hal.science/hal-04222122/file/Nodalida_segmentation.pdf en eng HAL CCSD ACL: Association for Computational Linguistics University of Tartu Library hal-04222122 https://hal.science/hal-04222122 https://hal.science/hal-04222122/document https://hal.science/hal-04222122/file/Nodalida_segmentation.pdf info:eu-repo/semantics/OpenAccess Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa) 24th Nordic Conference on Computational Linguistics (NoDaLiDa 2023) https://hal.science/hal-04222122 24th Nordic Conference on Computational Linguistics (NoDaLiDa 2023), May 2023, Tórshavn, Faroe Islands, Finland https://aclanthology.org/2023.nodalida-1.44/ discourse segmentation low-resource learning conversational speech [SCCO.LING]Cognitive science/Linguistics [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing [INFO.INFO-FL]Computer Science [cs]/Formal Languages and Automata Theory [cs.FL] info:eu-repo/semantics/conferenceObject Conference papers 2023 ftutoulouse3hal 2024-04-18T00:40:56Z International audience While discourse segmentation and parsing has made considerable progress in recent years, discursive analysis of conversational speech remains a difficult issue. In this paper, we exploit a French data set that has been manually segmented into discourse units to compare two approaches to discourse segmentation: fine-tuning existing systems on manual segmentation vs. using hand-crafted labeling rules to develop a weakly supervised segmenter. Our results show that both approaches yield similar performance in terms of f-score while data programming requires less manual annotation work. In a second experiment we play with the amount of training data used for fine-tuning systems and show that a small amount of hand labeled data is enough to obtain good results (albeit not as good as when all available annotated data are used). Conference Object Faroe Islands Université Toulouse III - Paul Sabatier: HAL-UPS Faroe Islands Tórshavn ENVELOPE(-6.772,-6.772,62.010,62.010) |
institution |
Open Polar |
collection |
Université Toulouse III - Paul Sabatier: HAL-UPS |
op_collection_id |
ftutoulouse3hal |
language |
English |
topic |
discourse segmentation low-resource learning conversational speech [SCCO.LING]Cognitive science/Linguistics [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing [INFO.INFO-FL]Computer Science [cs]/Formal Languages and Automata Theory [cs.FL] |
spellingShingle |
discourse segmentation low-resource learning conversational speech [SCCO.LING]Cognitive science/Linguistics [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing [INFO.INFO-FL]Computer Science [cs]/Formal Languages and Automata Theory [cs.FL] Prevot, Laurent Hunter, Julie Muller, Philippe Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus |
topic_facet |
discourse segmentation low-resource learning conversational speech [SCCO.LING]Cognitive science/Linguistics [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing [INFO.INFO-FL]Computer Science [cs]/Formal Languages and Automata Theory [cs.FL] |
description |
International audience While discourse segmentation and parsing has made considerable progress in recent years, discursive analysis of conversational speech remains a difficult issue. In this paper, we exploit a French data set that has been manually segmented into discourse units to compare two approaches to discourse segmentation: fine-tuning existing systems on manual segmentation vs. using hand-crafted labeling rules to develop a weakly supervised segmenter. Our results show that both approaches yield similar performance in terms of f-score while data programming requires less manual annotation work. In a second experiment we play with the amount of training data used for fine-tuning systems and show that a small amount of hand labeled data is enough to obtain good results (albeit not as good as when all available annotated data are used). |
author2 |
Institut universitaire de France (IUF) Ministère de l'Education nationale, de l’Enseignement supérieur et de la Recherche (M.E.N.E.S.R.) Laboratoire Parole et Langage (LPL) Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS) Linagora Labs Toulouse Linagora Puteaux MEthodes et ingénierie des Langues, des Ontologies et du DIscours (IRIT-MELODI) Institut de recherche en informatique de Toulouse (IRIT) Université Toulouse Capitole (UT Capitole) Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J) Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3) Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP) Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI) Université Toulouse - Jean Jaurès (UT2J) Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3) Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole) Université de Toulouse (UT) Université Toulouse III - Paul Sabatier (UT3) Tanel Alumäe Mark Fishel ANR-20-CE23-0017,SUMM-RE,Supervision Distante pour les Compte-Rendus Enrichis de Relations Rhétoriques(2020) ANR-16-CONV-0002,ILCB,ILCB: Institute of Language Communication and the Brain(2016) |
format |
Conference Object |
author |
Prevot, Laurent Hunter, Julie Muller, Philippe |
author_facet |
Prevot, Laurent Hunter, Julie Muller, Philippe |
author_sort |
Prevot, Laurent |
title |
Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus |
title_short |
Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus |
title_full |
Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus |
title_fullStr |
Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus |
title_full_unstemmed |
Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus |
title_sort |
comparing methods for segmenting elementary discourse units in a french conversational corpus |
publisher |
HAL CCSD |
publishDate |
2023 |
url |
https://hal.science/hal-04222122 https://hal.science/hal-04222122/document https://hal.science/hal-04222122/file/Nodalida_segmentation.pdf |
op_coverage |
Tórshavn, Faroe Islands, Finland |
long_lat |
ENVELOPE(-6.772,-6.772,62.010,62.010) |
geographic |
Faroe Islands Tórshavn |
geographic_facet |
Faroe Islands Tórshavn |
genre |
Faroe Islands |
genre_facet |
Faroe Islands |
op_source |
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa) 24th Nordic Conference on Computational Linguistics (NoDaLiDa 2023) https://hal.science/hal-04222122 24th Nordic Conference on Computational Linguistics (NoDaLiDa 2023), May 2023, Tórshavn, Faroe Islands, Finland https://aclanthology.org/2023.nodalida-1.44/ |
op_relation |
hal-04222122 https://hal.science/hal-04222122 https://hal.science/hal-04222122/document https://hal.science/hal-04222122/file/Nodalida_segmentation.pdf |
op_rights |
info:eu-repo/semantics/OpenAccess |
_version_ |
1798845548797624320 |