Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora
International audience Following the pioneering work by \cite{Li-Gaussier-10}, we address in this paper the analysis of a family of quantitative comparability measures dedicated to the construction and evaluation of topical comparable corpora. After recalling the definition of the quantitative compa...
Main Authors: | , , |
---|---|
Other Authors: | , , , , , , |
Format: | Conference Object |
Language: | English |
Published: |
HAL CCSD
2014
|
Subjects: | |
Online Access: | https://hal.science/hal-00995294 |
id |
ftinsarennhal:oai:HAL:hal-00995294v1 |
---|---|
record_format |
openpolar |
spelling |
ftinsarennhal:oai:HAL:hal-00995294v1 2024-09-15T18:14:04+00:00 Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora Ke, Guiyao Marteau, Pierre-François Ménier, Gildas Expressiveness in Human Centered Data/Media (EXPRESSION) Université de Bretagne Sud (UBS)-MEDIA ET INTERACTIONS (IRISA-D6) Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA) Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes) Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes) Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA) Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS) Reykjavik, Iceland 2014-05-26 https://hal.science/hal-00995294 en eng HAL CCSD hal-00995294 https://hal.science/hal-00995294 The 9th edition of the Language Resources and Evaluation Conference, LREC 2014 LREC 2014, the 9th edition of the Language Resources and Evaluation Conference https://hal.science/hal-00995294 LREC 2014, the 9th edition of the Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland Comparable corpora Comparability measures Evaluation [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] info:eu-repo/semantics/conferenceObject Conference papers 2014 ftinsarennhal 2024-08-28T00:09:11Z International audience Following the pioneering work by \cite{Li-Gaussier-10}, we address in this paper the analysis of a family of quantitative comparability measures dedicated to the construction and evaluation of topical comparable corpora. After recalling the definition of the quantitative comparability measure proposed by \cite{Li-Gaussier-10}, we develop some variants of this measure based primarily on the consideration that the occurrence frequencies of lexical entries and the number of their translations are important. We compare the respective advantages and disadvantages of these variants in the context of an evaluation framework that is based on the progressive degradation of the Europarl parallel corpus. The degradation is obtained by replacing either deterministically or randomly a varying amount of lines in blocks that compose partitions of the initial Europarl corpus. The impact of the coverage of bilingual dictionaries on these measures is also discussed and perspectives are finally presented. Conference Object Iceland INSA Rennes HAL (Institut National des Sciences Appliquées) |
institution |
Open Polar |
collection |
INSA Rennes HAL (Institut National des Sciences Appliquées) |
op_collection_id |
ftinsarennhal |
language |
English |
topic |
Comparable corpora Comparability measures Evaluation [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] |
spellingShingle |
Comparable corpora Comparability measures Evaluation [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] Ke, Guiyao Marteau, Pierre-François Ménier, Gildas Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora |
topic_facet |
Comparable corpora Comparability measures Evaluation [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] |
description |
International audience Following the pioneering work by \cite{Li-Gaussier-10}, we address in this paper the analysis of a family of quantitative comparability measures dedicated to the construction and evaluation of topical comparable corpora. After recalling the definition of the quantitative comparability measure proposed by \cite{Li-Gaussier-10}, we develop some variants of this measure based primarily on the consideration that the occurrence frequencies of lexical entries and the number of their translations are important. We compare the respective advantages and disadvantages of these variants in the context of an evaluation framework that is based on the progressive degradation of the Europarl parallel corpus. The degradation is obtained by replacing either deterministically or randomly a varying amount of lines in blocks that compose partitions of the initial Europarl corpus. The impact of the coverage of bilingual dictionaries on these measures is also discussed and perspectives are finally presented. |
author2 |
Expressiveness in Human Centered Data/Media (EXPRESSION) Université de Bretagne Sud (UBS)-MEDIA ET INTERACTIONS (IRISA-D6) Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA) Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes) Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes) Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA) Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS) |
format |
Conference Object |
author |
Ke, Guiyao Marteau, Pierre-François Ménier, Gildas |
author_facet |
Ke, Guiyao Marteau, Pierre-François Ménier, Gildas |
author_sort |
Ke, Guiyao |
title |
Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora |
title_short |
Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora |
title_full |
Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora |
title_fullStr |
Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora |
title_full_unstemmed |
Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora |
title_sort |
variations on quantitative comparability measures and their evaluations on synthetic french-english comparable corpora |
publisher |
HAL CCSD |
publishDate |
2014 |
url |
https://hal.science/hal-00995294 |
op_coverage |
Reykjavik, Iceland |
genre |
Iceland |
genre_facet |
Iceland |
op_source |
The 9th edition of the Language Resources and Evaluation Conference, LREC 2014 LREC 2014, the 9th edition of the Language Resources and Evaluation Conference https://hal.science/hal-00995294 LREC 2014, the 9th edition of the Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland |
op_relation |
hal-00995294 https://hal.science/hal-00995294 |
_version_ |
1810451849475522560 |