The Relevance of the Source Language in Transfer Learning for ASR

This study presents new experiments on Zyrian Komi speech recognition. We use Deep-Speech to train ASR models from a language documentation corpus that contains both contemporary and archival recordings. Earlier studies have shown that transfer learning from English and using a domain matching Komi...

Full description

Bibliographic Details
Published in:Proceedings of the Workshop on Computational Methods for Endangered Languages
Main Authors: Hjortnæs , Nils, Partanen, Niko, Rießler, Michael, Tyers, Francis M.
Other Authors: The National Library of Finland, Library Network Services, Department of Finnish, Finno-Ugrian and Scandinavian Studies
Format: Conference Object
Language:English
Published: 2022
Subjects:
Online Access:http://hdl.handle.net/10138/351332
id ftunivhelsihelda:oai:helda.helsinki.fi:10138/351332
record_format openpolar
spelling ftunivhelsihelda:oai:helda.helsinki.fi:10138/351332 2024-01-07T09:44:40+01:00 The Relevance of the Source Language in Transfer Learning for ASR Hjortnæs , Nils Partanen, Niko Rießler, Michael Tyers, Francis M. The National Library of Finland, Library Network Services Department of Finnish, Finno-Ugrian and Scandinavian Studies 2022-12-02T08:07:03Z 7 application/pdf http://hdl.handle.net/10138/351332 eng eng 10.33011/computel.v1i.959 Proceedings of the 4th Workshop on the Use of Computational Methods in the Study of Endangered Languages Proceedings of the Workshop on Computational Methods for Endangered Languages 978-1-954085-01-5 Hjortnæs , N , Partanen , N , Rießler , M & Tyers , F M 2021 , The Relevance of the Source Language in Transfer Learning for ASR . in Proceedings of the 4th Workshop on the Use of Computational Methods in the Study of Endangered Languages . vol. 1 , Proceedings of the Workshop on Computational Methods for Endangered Languages , The Association for Computational Linguistics , pp. 63-69 , Workshop on the Use of Computational Methods in the Study of Endangered Languages , 02/03/2021 . https://doi.org/10.33011/computel.v1i.959 conference ORCID: /0000-0001-8584-3880/work/118364555 ORCID: /0000-0002-2397-2860/work/118365886 f93ccf3d-0870-447c-a825-15a0b3524679 http://hdl.handle.net/10138/351332 unspecified openAccess info:eu-repo/semantics/openAccess 6121 Languages Conference contribution publishedVersion 2022 ftunivhelsihelda 2023-12-14T00:13:17Z This study presents new experiments on Zyrian Komi speech recognition. We use Deep-Speech to train ASR models from a language documentation corpus that contains both contemporary and archival recordings. Earlier studies have shown that transfer learning from English and using a domain matching Komi language model both improve the CER and WER. In this study we experiment with transfer learning from a more relevant source language, Russian, and including Russian text in the language model construction. The motivation for this is that Russian and Komi are contemporary contact languages, and Russian is regularly present in the corpus. We found that despite the close contact of Russian and Komi, the size of the English speech corpus yielded greater performance when used as the source language. Additionally, we can report that already an update in DeepSpeech version improved the CER by 3.9% against the earlier studies, which is an important step in the development of Komi ASR. Peer reviewed Conference Object Komi language HELDA – University of Helsinki Open Repository Proceedings of the Workshop on Computational Methods for Endangered Languages 1 2
institution Open Polar
collection HELDA – University of Helsinki Open Repository
op_collection_id ftunivhelsihelda
language English
topic 6121 Languages
spellingShingle 6121 Languages
Hjortnæs , Nils
Partanen, Niko
Rießler, Michael
Tyers, Francis M.
The Relevance of the Source Language in Transfer Learning for ASR
topic_facet 6121 Languages
description This study presents new experiments on Zyrian Komi speech recognition. We use Deep-Speech to train ASR models from a language documentation corpus that contains both contemporary and archival recordings. Earlier studies have shown that transfer learning from English and using a domain matching Komi language model both improve the CER and WER. In this study we experiment with transfer learning from a more relevant source language, Russian, and including Russian text in the language model construction. The motivation for this is that Russian and Komi are contemporary contact languages, and Russian is regularly present in the corpus. We found that despite the close contact of Russian and Komi, the size of the English speech corpus yielded greater performance when used as the source language. Additionally, we can report that already an update in DeepSpeech version improved the CER by 3.9% against the earlier studies, which is an important step in the development of Komi ASR. Peer reviewed
author2 The National Library of Finland, Library Network Services
Department of Finnish, Finno-Ugrian and Scandinavian Studies
format Conference Object
author Hjortnæs , Nils
Partanen, Niko
Rießler, Michael
Tyers, Francis M.
author_facet Hjortnæs , Nils
Partanen, Niko
Rießler, Michael
Tyers, Francis M.
author_sort Hjortnæs , Nils
title The Relevance of the Source Language in Transfer Learning for ASR
title_short The Relevance of the Source Language in Transfer Learning for ASR
title_full The Relevance of the Source Language in Transfer Learning for ASR
title_fullStr The Relevance of the Source Language in Transfer Learning for ASR
title_full_unstemmed The Relevance of the Source Language in Transfer Learning for ASR
title_sort relevance of the source language in transfer learning for asr
publishDate 2022
url http://hdl.handle.net/10138/351332
genre Komi language
genre_facet Komi language
op_relation 10.33011/computel.v1i.959
Proceedings of the 4th Workshop on the Use of Computational Methods in the Study of Endangered Languages
Proceedings of the Workshop on Computational Methods for Endangered Languages
978-1-954085-01-5
Hjortnæs , N , Partanen , N , Rießler , M & Tyers , F M 2021 , The Relevance of the Source Language in Transfer Learning for ASR . in Proceedings of the 4th Workshop on the Use of Computational Methods in the Study of Endangered Languages . vol. 1 , Proceedings of the Workshop on Computational Methods for Endangered Languages , The Association for Computational Linguistics , pp. 63-69 , Workshop on the Use of Computational Methods in the Study of Endangered Languages , 02/03/2021 . https://doi.org/10.33011/computel.v1i.959
conference
ORCID: /0000-0001-8584-3880/work/118364555
ORCID: /0000-0002-2397-2860/work/118365886
f93ccf3d-0870-447c-a825-15a0b3524679
http://hdl.handle.net/10138/351332
op_rights unspecified
openAccess
info:eu-repo/semantics/openAccess
container_title Proceedings of the Workshop on Computational Methods for Endangered Languages
container_volume 1
container_issue 2
_version_ 1787426082927935488