Rule-based reordering spaces in statistical machine translation

International audience In Statistical Machine Translation (SMT), the constraints on wordreorderings have a great impact on the set of potential translations that areexplored. Notwithstanding computationnal issues, the reordering spaceof a SMT system needs to be designed with great care: if a largers...

Full description

Bibliographic Details
Main Authors: Pécheux, Nicolas, Allauzen, Alexandre, Yvon, François
Other Authors: Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris-Sud - Paris 11 (UP11)-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris Saclay (COmUE)
Format: Conference Object
Language:English
Published: HAL CCSD 2014
Subjects:
Online Access:https://hal.archives-ouvertes.fr/hal-01908354
id ftunivnantes:oai:HAL:hal-01908354v1
record_format openpolar
spelling ftunivnantes:oai:HAL:hal-01908354v1 2023-05-15T16:49:57+02:00 Rule-based reordering spaces in statistical machine translation Pécheux, Nicolas Allauzen, Alexandre Yvon, François Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI) Université Paris-Sud - Paris 11 (UP11)-Sorbonne Université - UFR d'Ingénierie (UFR 919) Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris Saclay (COmUE) Reykjavik, Iceland 2014-01-01 https://hal.archives-ouvertes.fr/hal-01908354 en eng HAL CCSD hal-01908354 https://hal.archives-ouvertes.fr/hal-01908354 International Conference on Language Resources and Evaluation https://hal.archives-ouvertes.fr/hal-01908354 International Conference on Language Resources and Evaluation, Jan 2014, Reykjavik, Iceland Statistical Machine Translation Preordering [SHS.INFO.AUTR]Humanities and Social Sciences/Library and information sciences/domain_shs.info.autr info:eu-repo/semantics/conferenceObject Conference papers 2014 ftunivnantes 2022-10-19T00:02:33Z International audience In Statistical Machine Translation (SMT), the constraints on wordreorderings have a great impact on the set of potential translations that areexplored. Notwithstanding computationnal issues, the reordering spaceof a SMT system needs to be designed with great care: if a largersearch space is likely to yield better translations, it may also leadto more decoding errors, because of the added ambiguity and theinteraction with the pruning strategy. In this paper, we study this trade-off using a state-of-the arttranslation system, where all reorderings are represented in a word lattice prior todecoding. This allows us to directly explore and comparedifferent reordering spaces. We study in detail a rule-basedpreordering system, varying the length or number of rules, the tagsetused, as well as contrasting with oracle settings and purelycombinatorial subsets of permutations. We focus on two language pairs: English-French, a close language pair and English-German, known to bea more challenging reordering pair. Conference Object Iceland Université de Nantes: HAL-UNIV-NANTES
institution Open Polar
collection Université de Nantes: HAL-UNIV-NANTES
op_collection_id ftunivnantes
language English
topic Statistical Machine Translation
Preordering
[SHS.INFO.AUTR]Humanities and Social Sciences/Library and information sciences/domain_shs.info.autr
spellingShingle Statistical Machine Translation
Preordering
[SHS.INFO.AUTR]Humanities and Social Sciences/Library and information sciences/domain_shs.info.autr
Pécheux, Nicolas
Allauzen, Alexandre
Yvon, François
Rule-based reordering spaces in statistical machine translation
topic_facet Statistical Machine Translation
Preordering
[SHS.INFO.AUTR]Humanities and Social Sciences/Library and information sciences/domain_shs.info.autr
description International audience In Statistical Machine Translation (SMT), the constraints on wordreorderings have a great impact on the set of potential translations that areexplored. Notwithstanding computationnal issues, the reordering spaceof a SMT system needs to be designed with great care: if a largersearch space is likely to yield better translations, it may also leadto more decoding errors, because of the added ambiguity and theinteraction with the pruning strategy. In this paper, we study this trade-off using a state-of-the arttranslation system, where all reorderings are represented in a word lattice prior todecoding. This allows us to directly explore and comparedifferent reordering spaces. We study in detail a rule-basedpreordering system, varying the length or number of rules, the tagsetused, as well as contrasting with oracle settings and purelycombinatorial subsets of permutations. We focus on two language pairs: English-French, a close language pair and English-German, known to bea more challenging reordering pair.
author2 Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI)
Université Paris-Sud - Paris 11 (UP11)-Sorbonne Université - UFR d'Ingénierie (UFR 919)
Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris Saclay (COmUE)
format Conference Object
author Pécheux, Nicolas
Allauzen, Alexandre
Yvon, François
author_facet Pécheux, Nicolas
Allauzen, Alexandre
Yvon, François
author_sort Pécheux, Nicolas
title Rule-based reordering spaces in statistical machine translation
title_short Rule-based reordering spaces in statistical machine translation
title_full Rule-based reordering spaces in statistical machine translation
title_fullStr Rule-based reordering spaces in statistical machine translation
title_full_unstemmed Rule-based reordering spaces in statistical machine translation
title_sort rule-based reordering spaces in statistical machine translation
publisher HAL CCSD
publishDate 2014
url https://hal.archives-ouvertes.fr/hal-01908354
op_coverage Reykjavik, Iceland
genre Iceland
genre_facet Iceland
op_source International Conference on Language Resources and Evaluation
https://hal.archives-ouvertes.fr/hal-01908354
International Conference on Language Resources and Evaluation, Jan 2014, Reykjavik, Iceland
op_relation hal-01908354
https://hal.archives-ouvertes.fr/hal-01908354
_version_ 1766040117540028416