Machine Translation for English--Inuktitut with Segmentation, Data Acquisition and Pre-Training

Translating to and from low-resource polysynthetic languages present numerous challenges for NMT. We present the results of our systems for the English--Inuktitut language pair for the WMT 2020 translation tasks. We investigated the importance of correct morphological segmentation, whether or not ad...

Full description

Bibliographic Details
Main Authors: Roest, Christian, Edman, Lukas, Minnema, Gosse, Kelly, Kevin, Spenader, Jennifer, Toral, Antonio
Format: Article in Journal/Newspaper
Language:English
Published: Association for Computational Linguistics (ACL) 2020
Subjects:
Online Access:https://hdl.handle.net/11370/ce246963-3b30-4064-ab65-ae9e5e506c5e
https://research.rug.nl/en/publications/ce246963-3b30-4064-ab65-ae9e5e506c5e
https://pure.rug.nl/ws/files/156505029/2020.wmt_1.29.pdf
Description
Summary:Translating to and from low-resource polysynthetic languages present numerous challenges for NMT. We present the results of our systems for the English--Inuktitut language pair for the WMT 2020 translation tasks. We investigated the importance of correct morphological segmentation, whether or not adding data from a related language (Greenlandic) helps, and whether using contextual word embeddings improves translation. While each method showed some promise, the results are mixed.