Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora

International audience Following the pioneering work by \cite{Li-Gaussier-10}, we address in this paper the analysis of a family of quantitative comparability measures dedicated to the construction and evaluation of topical comparable corpora. After recalling the definition of the quantitative compa...

Full description

Bibliographic Details
Main Authors: Ke, Guiyao, Marteau, Pierre-François, Ménier, Gildas
Other Authors: Expressiveness in Human Centered Data/Media (EXPRESSION), Université de Bretagne Sud (UBS)-MEDIA ET INTERACTIONS (IRISA-D6), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)
Format: Conference Object
Language:English
Published: HAL CCSD 2014
Subjects:
Online Access:https://hal.archives-ouvertes.fr/hal-00995294
id ftccsdartic:oai:HAL:hal-00995294v1
record_format openpolar
spelling ftccsdartic:oai:HAL:hal-00995294v1 2023-05-15T16:50:13+02:00 Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora Ke, Guiyao Marteau, Pierre-François Ménier, Gildas Expressiveness in Human Centered Data/Media (EXPRESSION) Université de Bretagne Sud (UBS)-MEDIA ET INTERACTIONS (IRISA-D6) Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA) CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1) Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes) Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1) Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA) Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes) Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA) Reykjavik, Iceland 2014-05-26 https://hal.archives-ouvertes.fr/hal-00995294 en eng HAL CCSD hal-00995294 https://hal.archives-ouvertes.fr/hal-00995294 The 9th edition of the Language Resources and Evaluation Conference, LREC 2014 LREC 2014, the 9th edition of the Language Resources and Evaluation Conference https://hal.archives-ouvertes.fr/hal-00995294 LREC 2014, the 9th edition of the Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland Comparable corpora Comparability measures Evaluation [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] info:eu-repo/semantics/conferenceObject Conference papers 2014 ftccsdartic 2021-10-24T13:25:59Z International audience Following the pioneering work by \cite{Li-Gaussier-10}, we address in this paper the analysis of a family of quantitative comparability measures dedicated to the construction and evaluation of topical comparable corpora. After recalling the definition of the quantitative comparability measure proposed by \cite{Li-Gaussier-10}, we develop some variants of this measure based primarily on the consideration that the occurrence frequencies of lexical entries and the number of their translations are important. We compare the respective advantages and disadvantages of these variants in the context of an evaluation framework that is based on the progressive degradation of the Europarl parallel corpus. The degradation is obtained by replacing either deterministically or randomly a varying amount of lines in blocks that compose partitions of the initial Europarl corpus. The impact of the coverage of bilingual dictionaries on these measures is also discussed and perspectives are finally presented. Conference Object Iceland Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe)
institution Open Polar
collection Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe)
op_collection_id ftccsdartic
language English
topic Comparable corpora
Comparability measures
Evaluation
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
spellingShingle Comparable corpora
Comparability measures
Evaluation
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
Ke, Guiyao
Marteau, Pierre-François
Ménier, Gildas
Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora
topic_facet Comparable corpora
Comparability measures
Evaluation
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
description International audience Following the pioneering work by \cite{Li-Gaussier-10}, we address in this paper the analysis of a family of quantitative comparability measures dedicated to the construction and evaluation of topical comparable corpora. After recalling the definition of the quantitative comparability measure proposed by \cite{Li-Gaussier-10}, we develop some variants of this measure based primarily on the consideration that the occurrence frequencies of lexical entries and the number of their translations are important. We compare the respective advantages and disadvantages of these variants in the context of an evaluation framework that is based on the progressive degradation of the Europarl parallel corpus. The degradation is obtained by replacing either deterministically or randomly a varying amount of lines in blocks that compose partitions of the initial Europarl corpus. The impact of the coverage of bilingual dictionaries on these measures is also discussed and perspectives are finally presented.
author2 Expressiveness in Human Centered Data/Media (EXPRESSION)
Université de Bretagne Sud (UBS)-MEDIA ET INTERACTIONS (IRISA-D6)
Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA)
CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1)
Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes)
Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1)
Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA)
Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes)
Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)
format Conference Object
author Ke, Guiyao
Marteau, Pierre-François
Ménier, Gildas
author_facet Ke, Guiyao
Marteau, Pierre-François
Ménier, Gildas
author_sort Ke, Guiyao
title Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora
title_short Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora
title_full Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora
title_fullStr Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora
title_full_unstemmed Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora
title_sort variations on quantitative comparability measures and their evaluations on synthetic french-english comparable corpora
publisher HAL CCSD
publishDate 2014
url https://hal.archives-ouvertes.fr/hal-00995294
op_coverage Reykjavik, Iceland
genre Iceland
genre_facet Iceland
op_source The 9th edition of the Language Resources and Evaluation Conference, LREC 2014
LREC 2014, the 9th edition of the Language Resources and Evaluation Conference
https://hal.archives-ouvertes.fr/hal-00995294
LREC 2014, the 9th edition of the Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland
op_relation hal-00995294
https://hal.archives-ouvertes.fr/hal-00995294
_version_ 1766040387294593024