Automatic Speech Recognition of Low-Resource Languages Based on Chukchi

The following paper presents a project focused on the research and creation of a new Automatic Speech Recognition (ASR) based in the Chukchi language. There is no one complete corpus of the Chukchi language, so most of the work consisted in collecting audio and texts in the Chukchi language from ope...

Full description

Bibliographic Details
Main Authors: Safonova, Anastasia, Yudina, Tatiana, Nadimanov, Emil, Davenport, Cydnie
Format: Text
Language:unknown
Published: 2022
Subjects:
Online Access:http://arxiv.org/abs/2210.05726
id ftarxivpreprints:oai:arXiv.org:2210.05726
record_format openpolar
spelling ftarxivpreprints:oai:arXiv.org:2210.05726 2023-09-05T13:18:51+02:00 Automatic Speech Recognition of Low-Resource Languages Based on Chukchi Safonova, Anastasia Yudina, Tatiana Nadimanov, Emil Davenport, Cydnie 2022-10-11 http://arxiv.org/abs/2210.05726 unknown http://arxiv.org/abs/2210.05726 Computer Science - Computation and Language text 2022 ftarxivpreprints 2023-08-16T17:19:46Z The following paper presents a project focused on the research and creation of a new Automatic Speech Recognition (ASR) based in the Chukchi language. There is no one complete corpus of the Chukchi language, so most of the work consisted in collecting audio and texts in the Chukchi language from open sources and processing them. We managed to collect 21:34:23 hours of audio recordings and 112,719 sentences (or 2,068,273 words) of text in the Chukchi language. The XLSR model was trained on the obtained data, which showed good results even with a small amount of data. Besides the fact that the Chukchi language is a low-resource language, it is also polysynthetic, which significantly complicates any automatic processing. Thus, the usual WER metric for evaluating ASR becomes less indicative for a polysynthetic language. However, the CER metric showed good results. The question of metrics for polysynthetic languages remains open. Text Chukchi ArXiv.org (Cornell University Library)
institution Open Polar
collection ArXiv.org (Cornell University Library)
op_collection_id ftarxivpreprints
language unknown
topic Computer Science - Computation and Language
spellingShingle Computer Science - Computation and Language
Safonova, Anastasia
Yudina, Tatiana
Nadimanov, Emil
Davenport, Cydnie
Automatic Speech Recognition of Low-Resource Languages Based on Chukchi
topic_facet Computer Science - Computation and Language
description The following paper presents a project focused on the research and creation of a new Automatic Speech Recognition (ASR) based in the Chukchi language. There is no one complete corpus of the Chukchi language, so most of the work consisted in collecting audio and texts in the Chukchi language from open sources and processing them. We managed to collect 21:34:23 hours of audio recordings and 112,719 sentences (or 2,068,273 words) of text in the Chukchi language. The XLSR model was trained on the obtained data, which showed good results even with a small amount of data. Besides the fact that the Chukchi language is a low-resource language, it is also polysynthetic, which significantly complicates any automatic processing. Thus, the usual WER metric for evaluating ASR becomes less indicative for a polysynthetic language. However, the CER metric showed good results. The question of metrics for polysynthetic languages remains open.
format Text
author Safonova, Anastasia
Yudina, Tatiana
Nadimanov, Emil
Davenport, Cydnie
author_facet Safonova, Anastasia
Yudina, Tatiana
Nadimanov, Emil
Davenport, Cydnie
author_sort Safonova, Anastasia
title Automatic Speech Recognition of Low-Resource Languages Based on Chukchi
title_short Automatic Speech Recognition of Low-Resource Languages Based on Chukchi
title_full Automatic Speech Recognition of Low-Resource Languages Based on Chukchi
title_fullStr Automatic Speech Recognition of Low-Resource Languages Based on Chukchi
title_full_unstemmed Automatic Speech Recognition of Low-Resource Languages Based on Chukchi
title_sort automatic speech recognition of low-resource languages based on chukchi
publishDate 2022
url http://arxiv.org/abs/2210.05726
genre Chukchi
genre_facet Chukchi
op_relation http://arxiv.org/abs/2210.05726
_version_ 1776199702425894912