Extended study on using pretrained language models and YiSi-1 for machine translation evaluation

We present an extended study on using pretrained language models and YiSi-1 for machine translation evaluation. Although the recently proposed contextual embedding based metrics, YiSi-1, significantly outperform BLEU and other metrics in correlating with human judgment on translation quality, we hav...

Full description

Bibliographic Details
Main Author: Lo, Chi-Kiu
Format: Article in Journal/Newspaper
Language:English
Published: Association for Computational Linguistics 2020
Subjects:
Online Access:https://nrc-publications.canada.ca/eng/view/ft/?id=cd8d16f5-2b67-41aa-955f-84b4b6dc4e31
https://nrc-publications.canada.ca/eng/view/object/?id=cd8d16f5-2b67-41aa-955f-84b4b6dc4e31
https://nrc-publications.canada.ca/fra/voir/objet/?id=cd8d16f5-2b67-41aa-955f-84b4b6dc4e31
Description
Summary:We present an extended study on using pretrained language models and YiSi-1 for machine translation evaluation. Although the recently proposed contextual embedding based metrics, YiSi-1, significantly outperform BLEU and other metrics in correlating with human judgment on translation quality, we have yet to understand the full strength of using pretrained language models for machine translation evaluation. In this paper, we study YiSi-1’s correlation with human translation quality judgment by varying three major attributes (which architecture; which inter- mediate layer; whether it is monolingual or multilingual) of the pretrained language models. Results of the study show further improvements over YiSi-1 on the WMT 2019 Metrics shared task. We also describe the pretrained language model we trained for evaluating Inuktitut machine translation output. Peer reviewed: Yes NRC publication: Yes