Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation

There are several approaches for improving neural machine translation for low-resource languages: monolingual data can be exploited via pretraining or data augmentation; parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; subword...

Full description

Bibliographic Details
Published in:	Machine Translation
Main Authors:	Gronroos, Stig-Arne, Virpioja, Sami, Kurimo, Mikko
Other Authors:	Department of Digital Humanities, Language Technology
Format:	Article in Journal/Newspaper
Language:	English
Published:	Springer Netherlands 2021
Subjects:	113 Computer and information sciences 6121 Languages North Sámi sami Sámi
Online Access:	http://hdl.handle.net/10138/330171

id	ftunivhelsihelda:oai:helda.helsinki.fi:10138/330171
record_format	openpolar
spelling	ftunivhelsihelda:oai:helda.helsinki.fi:10138/330171 2024-01-07T09:45:23+01:00 Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation Gronroos, Stig-Arne Virpioja, Sami Kurimo, Mikko Department of Digital Humanities Language Technology 2021-05-21T12:09:01Z 36 application/pdf http://hdl.handle.net/10138/330171 eng eng Springer Netherlands 10.1007/s10590-020-09253-x European Commission This study has been supported by the MeMAD project, funded by the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 780069), and the FoTran project, funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 771113). Computer resources within the Aalto University School of Science “Science-IT” project were used. This study has been supported by the MeMAD project, funded by the European Union?s Horizon 2020 research and innovation programme?(Grant Agreement No.?780069), and the FoTran project, funded by the European Research Council?(ERC) under the European Union?s Horizon 2020 research and innovation programme?(Grant Agreement No.?771113). Computer resources within the Aalto University School of Science ?Science-IT? project were used. 771113 Gronroos , S-A , Virpioja , S & Kurimo , M 2021 , ' Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation ' , Machine Translation , vol. 34 , pp. 251-286 . https://doi.org/10.1007/s10590-020-09253-x ORCID: /0000-0002-3568-150X/work/94250254 ORCID: /0000-0002-3750-6924/work/116878494 85100018247 2f41b658-ad43-4a8b-aeb6-fc9e198560ff http://hdl.handle.net/10138/330171 000613039800001 unspecified openAccess info:eu-repo/semantics/openAccess 113 Computer and information sciences 6121 Languages Article acceptedVersion submittedVersion 2021 ftunivhelsihelda 2023-12-14T00:14:52Z There are several approaches for improving neural machine translation for low-resource languages: monolingual data can be exploited via pretraining or data augmentation; parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary. We review these approaches in the context of an asymmetric-resource one-to-many translation task, in which the pair of target languages are related, with one being a very low-resource and the other a higher-resource language. We test various methods on three artificially restricted translation tasks—English to Estonian (low-resource) and Finnish (high-resource), English to Slovak and Czech, English to Danish and Swedish—and one real-world task, Norwegian to North Sámi and Finnish. The experiments show positive effects especially for scheduled multi-task learning, denoising autoencoder, and subword sampling. There are several approaches for improving neural machine translation for low-resource languages: monolingual data can be exploited via pretraining or data augmentation; parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary. We review these approaches in the context of an asymmetric-resource one-to-many translation task, in which the pair of target languages are related, with one being a very low-resource and the other a higher-resource language. We test various methods on three artificially restricted translation tasks-English to Estonian (low-resource) and Finnish (high-resource), English to Slovak and Czech, English to Danish and Swedish-and one real-world task, Norwegian to North Sami and Finnish. The experiments show positive effects especially for scheduled multi-task learning, denoising autoencoder, and subword sampling. Peer ... Article in Journal/Newspaper North Sámi sami Sámi HELDA – University of Helsinki Open Repository Machine Translation 34 4 251 286
institution	Open Polar
collection	HELDA – University of Helsinki Open Repository
op_collection_id	ftunivhelsihelda
language	English
topic	113 Computer and information sciences 6121 Languages
spellingShingle	113 Computer and information sciences 6121 Languages Gronroos, Stig-Arne Virpioja, Sami Kurimo, Mikko Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation
topic_facet	113 Computer and information sciences 6121 Languages
description	There are several approaches for improving neural machine translation for low-resource languages: monolingual data can be exploited via pretraining or data augmentation; parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary. We review these approaches in the context of an asymmetric-resource one-to-many translation task, in which the pair of target languages are related, with one being a very low-resource and the other a higher-resource language. We test various methods on three artificially restricted translation tasks—English to Estonian (low-resource) and Finnish (high-resource), English to Slovak and Czech, English to Danish and Swedish—and one real-world task, Norwegian to North Sámi and Finnish. The experiments show positive effects especially for scheduled multi-task learning, denoising autoencoder, and subword sampling. There are several approaches for improving neural machine translation for low-resource languages: monolingual data can be exploited via pretraining or data augmentation; parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary. We review these approaches in the context of an asymmetric-resource one-to-many translation task, in which the pair of target languages are related, with one being a very low-resource and the other a higher-resource language. We test various methods on three artificially restricted translation tasks-English to Estonian (low-resource) and Finnish (high-resource), English to Slovak and Czech, English to Danish and Swedish-and one real-world task, Norwegian to North Sami and Finnish. The experiments show positive effects especially for scheduled multi-task learning, denoising autoencoder, and subword sampling. Peer ...
author2	Department of Digital Humanities Language Technology
format	Article in Journal/Newspaper
author	Gronroos, Stig-Arne Virpioja, Sami Kurimo, Mikko
author_facet	Gronroos, Stig-Arne Virpioja, Sami Kurimo, Mikko
author_sort	Gronroos, Stig-Arne
title	Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation
title_short	Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation
title_full	Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation
title_fullStr	Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation
title_full_unstemmed	Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation
title_sort	transfer learning and subword sampling for asymmetric-resource one-to-many neural translation
publisher	Springer Netherlands
publishDate	2021
url	http://hdl.handle.net/10138/330171
genre	North Sámi sami Sámi
genre_facet	North Sámi sami Sámi
op_relation	10.1007/s10590-020-09253-x European Commission This study has been supported by the MeMAD project, funded by the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 780069), and the FoTran project, funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 771113). Computer resources within the Aalto University School of Science “Science-IT” project were used. This study has been supported by the MeMAD project, funded by the European Union?s Horizon 2020 research and innovation programme?(Grant Agreement No.?780069), and the FoTran project, funded by the European Research Council?(ERC) under the European Union?s Horizon 2020 research and innovation programme?(Grant Agreement No.?771113). Computer resources within the Aalto University School of Science ?Science-IT? project were used. 771113 Gronroos , S-A , Virpioja , S & Kurimo , M 2021 , ' Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation ' , Machine Translation , vol. 34 , pp. 251-286 . https://doi.org/10.1007/s10590-020-09253-x ORCID: /0000-0002-3568-150X/work/94250254 ORCID: /0000-0002-3750-6924/work/116878494 85100018247 2f41b658-ad43-4a8b-aeb6-fc9e198560ff http://hdl.handle.net/10138/330171 000613039800001
op_rights	unspecified openAccess info:eu-repo/semantics/openAccess
container_title	Machine Translation
container_volume	34
container_issue	4
container_start_page	251
op_container_end_page	286
_version_	1787426914521055232

Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation

Similar Items