Rule-based reordering spaces in statistical machine translation

International audience In Statistical Machine Translation (SMT), the constraints on wordreorderings have a great impact on the set of potential translations that areexplored. Notwithstanding computationnal issues, the reordering spaceof a SMT system needs to be designed with great care: if a largers...

Full description

Bibliographic Details
Main Authors: Pécheux, Nicolas, Allauzen, Alexandre, Yvon, François
Other Authors: Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris-Sud - Paris 11 (UP11)-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris Saclay (COmUE)
Format: Conference Object
Language:English
Published: HAL CCSD 2014
Subjects:
Online Access:https://hal.science/hal-01908354
Description
Summary:International audience In Statistical Machine Translation (SMT), the constraints on wordreorderings have a great impact on the set of potential translations that areexplored. Notwithstanding computationnal issues, the reordering spaceof a SMT system needs to be designed with great care: if a largersearch space is likely to yield better translations, it may also leadto more decoding errors, because of the added ambiguity and theinteraction with the pruning strategy. In this paper, we study this trade-off using a state-of-the arttranslation system, where all reorderings are represented in a word lattice prior todecoding. This allows us to directly explore and comparedifferent reordering spaces. We study in detail a rule-basedpreordering system, varying the length or number of rules, the tagsetused, as well as contrasting with oracle settings and purelycombinatorial subsets of permutations. We focus on two language pairs: English-French, a close language pair and English-German, known to bea more challenging reordering pair.