Automatic Speech Recognition of Low-Resource Languages Based on Chukchi

The following paper presents a project focused on the research and creation of a new Automatic Speech Recognition (ASR) based in the Chukchi language. There is no one complete corpus of the Chukchi language, so most of the work consisted in collecting audio and texts in the Chukchi language from ope...

Full description

Bibliographic Details
Main Authors:	Safonova, Anastasia, Yudina, Tatiana, Nadimanov, Emil, Davenport, Cydnie
Format:	Text
Language:	unknown
Published:	2022
Subjects:	Computer Science - Computation and Language Chukchi
Online Access:	http://arxiv.org/abs/2210.05726

id	ftarxivpreprints:oai:arXiv.org:2210.05726
record_format	openpolar
spelling	ftarxivpreprints:oai:arXiv.org:2210.05726 2023-09-05T13:18:51+02:00 Automatic Speech Recognition of Low-Resource Languages Based on Chukchi Safonova, Anastasia Yudina, Tatiana Nadimanov, Emil Davenport, Cydnie 2022-10-11 http://arxiv.org/abs/2210.05726 unknown http://arxiv.org/abs/2210.05726 Computer Science - Computation and Language text 2022 ftarxivpreprints 2023-08-16T17:19:46Z The following paper presents a project focused on the research and creation of a new Automatic Speech Recognition (ASR) based in the Chukchi language. There is no one complete corpus of the Chukchi language, so most of the work consisted in collecting audio and texts in the Chukchi language from open sources and processing them. We managed to collect 21:34:23 hours of audio recordings and 112,719 sentences (or 2,068,273 words) of text in the Chukchi language. The XLSR model was trained on the obtained data, which showed good results even with a small amount of data. Besides the fact that the Chukchi language is a low-resource language, it is also polysynthetic, which significantly complicates any automatic processing. Thus, the usual WER metric for evaluating ASR becomes less indicative for a polysynthetic language. However, the CER metric showed good results. The question of metrics for polysynthetic languages remains open. Text Chukchi ArXiv.org (Cornell University Library)
institution	Open Polar
collection	ArXiv.org (Cornell University Library)
op_collection_id	ftarxivpreprints
language	unknown
topic	Computer Science - Computation and Language
spellingShingle	Computer Science - Computation and Language Safonova, Anastasia Yudina, Tatiana Nadimanov, Emil Davenport, Cydnie Automatic Speech Recognition of Low-Resource Languages Based on Chukchi
topic_facet	Computer Science - Computation and Language
description	The following paper presents a project focused on the research and creation of a new Automatic Speech Recognition (ASR) based in the Chukchi language. There is no one complete corpus of the Chukchi language, so most of the work consisted in collecting audio and texts in the Chukchi language from open sources and processing them. We managed to collect 21:34:23 hours of audio recordings and 112,719 sentences (or 2,068,273 words) of text in the Chukchi language. The XLSR model was trained on the obtained data, which showed good results even with a small amount of data. Besides the fact that the Chukchi language is a low-resource language, it is also polysynthetic, which significantly complicates any automatic processing. Thus, the usual WER metric for evaluating ASR becomes less indicative for a polysynthetic language. However, the CER metric showed good results. The question of metrics for polysynthetic languages remains open.
format	Text
author	Safonova, Anastasia Yudina, Tatiana Nadimanov, Emil Davenport, Cydnie
author_facet	Safonova, Anastasia Yudina, Tatiana Nadimanov, Emil Davenport, Cydnie
author_sort	Safonova, Anastasia
title	Automatic Speech Recognition of Low-Resource Languages Based on Chukchi
title_short	Automatic Speech Recognition of Low-Resource Languages Based on Chukchi
title_full	Automatic Speech Recognition of Low-Resource Languages Based on Chukchi
title_fullStr	Automatic Speech Recognition of Low-Resource Languages Based on Chukchi
title_full_unstemmed	Automatic Speech Recognition of Low-Resource Languages Based on Chukchi
title_sort	automatic speech recognition of low-resource languages based on chukchi
publishDate	2022
url	http://arxiv.org/abs/2210.05726
genre	Chukchi
genre_facet	Chukchi
op_relation	http://arxiv.org/abs/2210.05726
_version_	1776199702425894912

Automatic Speech Recognition of Low-Resource Languages Based on Chukchi

Similar Items