Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora
International audience Following the pioneering work by \cite{Li-Gaussier-10}, we address in this paper the analysis of a family of quantitative comparability measures dedicated to the construction and evaluation of topical comparable corpora. After recalling the definition of the quantitative compa...
Main Authors: | , , |
---|---|
Other Authors: | , , , , , , , , |
Format: | Conference Object |
Language: | English |
Published: |
HAL CCSD
2014
|
Subjects: | |
Online Access: | https://hal.archives-ouvertes.fr/hal-00995294 |
id |
ftccsdartic:oai:HAL:hal-00995294v1 |
---|---|
record_format |
openpolar |
spelling |
ftccsdartic:oai:HAL:hal-00995294v1 2023-05-15T16:50:13+02:00 Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora Ke, Guiyao Marteau, Pierre-François Ménier, Gildas Expressiveness in Human Centered Data/Media (EXPRESSION) Université de Bretagne Sud (UBS)-MEDIA ET INTERACTIONS (IRISA-D6) Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA) CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1) Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes) Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1) Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA) Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes) Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA) Reykjavik, Iceland 2014-05-26 https://hal.archives-ouvertes.fr/hal-00995294 en eng HAL CCSD hal-00995294 https://hal.archives-ouvertes.fr/hal-00995294 The 9th edition of the Language Resources and Evaluation Conference, LREC 2014 LREC 2014, the 9th edition of the Language Resources and Evaluation Conference https://hal.archives-ouvertes.fr/hal-00995294 LREC 2014, the 9th edition of the Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland Comparable corpora Comparability measures Evaluation [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] info:eu-repo/semantics/conferenceObject Conference papers 2014 ftccsdartic 2021-10-24T13:25:59Z International audience Following the pioneering work by \cite{Li-Gaussier-10}, we address in this paper the analysis of a family of quantitative comparability measures dedicated to the construction and evaluation of topical comparable corpora. After recalling the definition of the quantitative comparability measure proposed by \cite{Li-Gaussier-10}, we develop some variants of this measure based primarily on the consideration that the occurrence frequencies of lexical entries and the number of their translations are important. We compare the respective advantages and disadvantages of these variants in the context of an evaluation framework that is based on the progressive degradation of the Europarl parallel corpus. The degradation is obtained by replacing either deterministically or randomly a varying amount of lines in blocks that compose partitions of the initial Europarl corpus. The impact of the coverage of bilingual dictionaries on these measures is also discussed and perspectives are finally presented. Conference Object Iceland Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe) |
institution |
Open Polar |
collection |
Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe) |
op_collection_id |
ftccsdartic |
language |
English |
topic |
Comparable corpora Comparability measures Evaluation [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] |
spellingShingle |
Comparable corpora Comparability measures Evaluation [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] Ke, Guiyao Marteau, Pierre-François Ménier, Gildas Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora |
topic_facet |
Comparable corpora Comparability measures Evaluation [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] |
description |
International audience Following the pioneering work by \cite{Li-Gaussier-10}, we address in this paper the analysis of a family of quantitative comparability measures dedicated to the construction and evaluation of topical comparable corpora. After recalling the definition of the quantitative comparability measure proposed by \cite{Li-Gaussier-10}, we develop some variants of this measure based primarily on the consideration that the occurrence frequencies of lexical entries and the number of their translations are important. We compare the respective advantages and disadvantages of these variants in the context of an evaluation framework that is based on the progressive degradation of the Europarl parallel corpus. The degradation is obtained by replacing either deterministically or randomly a varying amount of lines in blocks that compose partitions of the initial Europarl corpus. The impact of the coverage of bilingual dictionaries on these measures is also discussed and perspectives are finally presented. |
author2 |
Expressiveness in Human Centered Data/Media (EXPRESSION) Université de Bretagne Sud (UBS)-MEDIA ET INTERACTIONS (IRISA-D6) Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA) CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1) Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes) Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1) Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA) Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes) Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA) |
format |
Conference Object |
author |
Ke, Guiyao Marteau, Pierre-François Ménier, Gildas |
author_facet |
Ke, Guiyao Marteau, Pierre-François Ménier, Gildas |
author_sort |
Ke, Guiyao |
title |
Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora |
title_short |
Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora |
title_full |
Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora |
title_fullStr |
Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora |
title_full_unstemmed |
Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora |
title_sort |
variations on quantitative comparability measures and their evaluations on synthetic french-english comparable corpora |
publisher |
HAL CCSD |
publishDate |
2014 |
url |
https://hal.archives-ouvertes.fr/hal-00995294 |
op_coverage |
Reykjavik, Iceland |
genre |
Iceland |
genre_facet |
Iceland |
op_source |
The 9th edition of the Language Resources and Evaluation Conference, LREC 2014 LREC 2014, the 9th edition of the Language Resources and Evaluation Conference https://hal.archives-ouvertes.fr/hal-00995294 LREC 2014, the 9th edition of the Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland |
op_relation |
hal-00995294 https://hal.archives-ouvertes.fr/hal-00995294 |
_version_ |
1766040387294593024 |