Rule-based reordering spaces in statistical machine translation
International audience In Statistical Machine Translation (SMT), the constraints on wordreorderings have a great impact on the set of potential translations that areexplored. Notwithstanding computationnal issues, the reordering spaceof a SMT system needs to be designed with great care: if a largers...
Main Authors: | , , |
---|---|
Other Authors: | , , |
Format: | Conference Object |
Language: | English |
Published: |
HAL CCSD
2014
|
Subjects: | |
Online Access: | https://hal.science/hal-01908354 |
id |
ftuniparissaclay:oai:HAL:hal-01908354v1 |
---|---|
record_format |
openpolar |
spelling |
ftuniparissaclay:oai:HAL:hal-01908354v1 2023-11-12T04:19:18+01:00 Rule-based reordering spaces in statistical machine translation Pécheux, Nicolas Allauzen, Alexandre Yvon, François Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI) Université Paris-Sud - Paris 11 (UP11)-Sorbonne Université - UFR d'Ingénierie (UFR 919) Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris Saclay (COmUE) Reykjavik, Iceland 2014-01-01 https://hal.science/hal-01908354 en eng HAL CCSD hal-01908354 https://hal.science/hal-01908354 International Conference on Language Resources and Evaluation https://hal.science/hal-01908354 International Conference on Language Resources and Evaluation, Jan 2014, Reykjavik, Iceland Statistical Machine Translation Preordering [SHS.INFO.AUTR]Humanities and Social Sciences/Library and information sciences/domain_shs.info.autr info:eu-repo/semantics/conferenceObject Conference papers 2014 ftuniparissaclay 2023-10-14T21:58:37Z International audience In Statistical Machine Translation (SMT), the constraints on wordreorderings have a great impact on the set of potential translations that areexplored. Notwithstanding computationnal issues, the reordering spaceof a SMT system needs to be designed with great care: if a largersearch space is likely to yield better translations, it may also leadto more decoding errors, because of the added ambiguity and theinteraction with the pruning strategy. In this paper, we study this trade-off using a state-of-the arttranslation system, where all reorderings are represented in a word lattice prior todecoding. This allows us to directly explore and comparedifferent reordering spaces. We study in detail a rule-basedpreordering system, varying the length or number of rules, the tagsetused, as well as contrasting with oracle settings and purelycombinatorial subsets of permutations. We focus on two language pairs: English-French, a close language pair and English-German, known to bea more challenging reordering pair. Conference Object Iceland Archives ouvertes de Paris-Saclay |
institution |
Open Polar |
collection |
Archives ouvertes de Paris-Saclay |
op_collection_id |
ftuniparissaclay |
language |
English |
topic |
Statistical Machine Translation Preordering [SHS.INFO.AUTR]Humanities and Social Sciences/Library and information sciences/domain_shs.info.autr |
spellingShingle |
Statistical Machine Translation Preordering [SHS.INFO.AUTR]Humanities and Social Sciences/Library and information sciences/domain_shs.info.autr Pécheux, Nicolas Allauzen, Alexandre Yvon, François Rule-based reordering spaces in statistical machine translation |
topic_facet |
Statistical Machine Translation Preordering [SHS.INFO.AUTR]Humanities and Social Sciences/Library and information sciences/domain_shs.info.autr |
description |
International audience In Statistical Machine Translation (SMT), the constraints on wordreorderings have a great impact on the set of potential translations that areexplored. Notwithstanding computationnal issues, the reordering spaceof a SMT system needs to be designed with great care: if a largersearch space is likely to yield better translations, it may also leadto more decoding errors, because of the added ambiguity and theinteraction with the pruning strategy. In this paper, we study this trade-off using a state-of-the arttranslation system, where all reorderings are represented in a word lattice prior todecoding. This allows us to directly explore and comparedifferent reordering spaces. We study in detail a rule-basedpreordering system, varying the length or number of rules, the tagsetused, as well as contrasting with oracle settings and purelycombinatorial subsets of permutations. We focus on two language pairs: English-French, a close language pair and English-German, known to bea more challenging reordering pair. |
author2 |
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI) Université Paris-Sud - Paris 11 (UP11)-Sorbonne Université - UFR d'Ingénierie (UFR 919) Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris Saclay (COmUE) |
format |
Conference Object |
author |
Pécheux, Nicolas Allauzen, Alexandre Yvon, François |
author_facet |
Pécheux, Nicolas Allauzen, Alexandre Yvon, François |
author_sort |
Pécheux, Nicolas |
title |
Rule-based reordering spaces in statistical machine translation |
title_short |
Rule-based reordering spaces in statistical machine translation |
title_full |
Rule-based reordering spaces in statistical machine translation |
title_fullStr |
Rule-based reordering spaces in statistical machine translation |
title_full_unstemmed |
Rule-based reordering spaces in statistical machine translation |
title_sort |
rule-based reordering spaces in statistical machine translation |
publisher |
HAL CCSD |
publishDate |
2014 |
url |
https://hal.science/hal-01908354 |
op_coverage |
Reykjavik, Iceland |
genre |
Iceland |
genre_facet |
Iceland |
op_source |
International Conference on Language Resources and Evaluation https://hal.science/hal-01908354 International Conference on Language Resources and Evaluation, Jan 2014, Reykjavik, Iceland |
op_relation |
hal-01908354 https://hal.science/hal-01908354 |
_version_ |
1782335776633126912 |