Extremely low-resource machine translation for closely related languages

An effective method to improve extremely low-resource neural machine translation is multilingual training, which can be improved by leveraging monolingual data to create synthetic bilingual corpora using the back-translation method. This work focuses on closely related languages from the Uralic lang...

Full description

Bibliographic Details
Main Authors: Tars, Maali, Tättar, Andre, Fišel, Mark
Format: Article in Journal/Newspaper
Language:unknown
Published: arXiv 2021
Subjects:
Online Access:https://dx.doi.org/10.48550/arxiv.2105.13065
https://arxiv.org/abs/2105.13065
id ftdatacite:10.48550/arxiv.2105.13065
record_format openpolar
spelling ftdatacite:10.48550/arxiv.2105.13065 2023-05-15T18:08:15+02:00 Extremely low-resource machine translation for closely related languages Tars, Maali Tättar, Andre Fišel, Mark 2021 https://dx.doi.org/10.48550/arxiv.2105.13065 https://arxiv.org/abs/2105.13065 unknown arXiv Creative Commons Attribution Share Alike 4.0 International https://creativecommons.org/licenses/by-sa/4.0/legalcode cc-by-sa-4.0 CC-BY-SA Computation and Language cs.CL FOS Computer and information sciences Article CreativeWork article Preprint 2021 ftdatacite https://doi.org/10.48550/arxiv.2105.13065 2022-03-10T14:27:09Z An effective method to improve extremely low-resource neural machine translation is multilingual training, which can be improved by leveraging monolingual data to create synthetic bilingual corpora using the back-translation method. This work focuses on closely related languages from the Uralic language family: from Estonian and Finnish geographical regions. We find that multilingual learning and synthetic corpora increase the translation quality in every language pair for which we have data. We show that transfer learning and fine-tuning are very effective for doing low-resource machine translation and achieve the best results. We collected new parallel data for Võro, North and South Saami and present first results of neural machine translation for these languages. : Accepted at Nodalida'2021 Article in Journal/Newspaper saami DataCite Metadata Store (German National Library of Science and Technology)
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language unknown
topic Computation and Language cs.CL
FOS Computer and information sciences
spellingShingle Computation and Language cs.CL
FOS Computer and information sciences
Tars, Maali
Tättar, Andre
Fišel, Mark
Extremely low-resource machine translation for closely related languages
topic_facet Computation and Language cs.CL
FOS Computer and information sciences
description An effective method to improve extremely low-resource neural machine translation is multilingual training, which can be improved by leveraging monolingual data to create synthetic bilingual corpora using the back-translation method. This work focuses on closely related languages from the Uralic language family: from Estonian and Finnish geographical regions. We find that multilingual learning and synthetic corpora increase the translation quality in every language pair for which we have data. We show that transfer learning and fine-tuning are very effective for doing low-resource machine translation and achieve the best results. We collected new parallel data for Võro, North and South Saami and present first results of neural machine translation for these languages. : Accepted at Nodalida'2021
format Article in Journal/Newspaper
author Tars, Maali
Tättar, Andre
Fišel, Mark
author_facet Tars, Maali
Tättar, Andre
Fišel, Mark
author_sort Tars, Maali
title Extremely low-resource machine translation for closely related languages
title_short Extremely low-resource machine translation for closely related languages
title_full Extremely low-resource machine translation for closely related languages
title_fullStr Extremely low-resource machine translation for closely related languages
title_full_unstemmed Extremely low-resource machine translation for closely related languages
title_sort extremely low-resource machine translation for closely related languages
publisher arXiv
publishDate 2021
url https://dx.doi.org/10.48550/arxiv.2105.13065
https://arxiv.org/abs/2105.13065
genre saami
genre_facet saami
op_rights Creative Commons Attribution Share Alike 4.0 International
https://creativecommons.org/licenses/by-sa/4.0/legalcode
cc-by-sa-4.0
op_rightsnorm CC-BY-SA
op_doi https://doi.org/10.48550/arxiv.2105.13065
_version_ 1766180524392448000