Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case

Recently, there has been a growth in the number of studies addressing the automatic processing of low-resource languages. The lack of speech and text data significantly hinders the development of speech technologies for such languages. This paper introduces an automatic speech recognition system for...

Full description

Bibliographic Details
Published in:Mathematics
Main Authors: Irina Kipyatkova, Ildar Kagirov
Format: Text
Language:English
Published: Multidisciplinary Digital Publishing Institute 2023
Subjects:
Online Access:https://doi.org/10.3390/math11183814
_version_ 1832474581830467584
author Irina Kipyatkova
Ildar Kagirov
author_facet Irina Kipyatkova
Ildar Kagirov
author_sort Irina Kipyatkova
collection MDPI Open Access Publishing
container_issue 18
container_start_page 3814
container_title Mathematics
container_volume 11
description Recently, there has been a growth in the number of studies addressing the automatic processing of low-resource languages. The lack of speech and text data significantly hinders the development of speech technologies for such languages. This paper introduces an automatic speech recognition system for Livvi-Karelian. Acoustic models based on artificial neural networks with time delays and hidden Markov models were trained using a limited speech dataset of 3.5 h. To augment the data, pitch and speech rate perturbation, SpecAugment, and their combinations were employed. Language models based on 3-grams and neural networks were trained using written texts and transcripts. The achieved word error rate metric of 22.80% is comparable to other low-resource languages. To the best of our knowledge, this is the first speech recognition system for Livvi-Karelian. The results obtained can be of a certain significance for development of automatic speech recognition systems not only for Livvi-Karelian, but also for other low-resource languages, including the fields of speech recognition and machine translation systems. Future work includes experiments with Karelian data using techniques such as transfer learning and DNN language models.
format Text
genre karelian
genre_facet karelian
id ftmdpi:oai:mdpi.com:/2227-7390/11/18/3814/
institution Open Polar
language English
op_collection_id ftmdpi
op_doi https://doi.org/10.3390/math11183814
op_relation E: Applied Mathematics
https://dx.doi.org/10.3390/math11183814
op_rights https://creativecommons.org/licenses/by/4.0/
op_source Mathematics
Volume 11
Issue 18
Pages: 3814
publishDate 2023
publisher Multidisciplinary Digital Publishing Institute
record_format openpolar
spelling ftmdpi:oai:mdpi.com:/2227-7390/11/18/3814/ 2025-05-18T14:03:57+00:00 Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case Irina Kipyatkova Ildar Kagirov 2023-09-05 application/pdf https://doi.org/10.3390/math11183814 eng eng Multidisciplinary Digital Publishing Institute E: Applied Mathematics https://dx.doi.org/10.3390/math11183814 https://creativecommons.org/licenses/by/4.0/ Mathematics Volume 11 Issue 18 Pages: 3814 low-resource languages automatic speech recognition audio data augmentation time delay neural network hidden Markov models long short-term memory Text 2023 ftmdpi https://doi.org/10.3390/math11183814 2025-04-22T00:41:02Z Recently, there has been a growth in the number of studies addressing the automatic processing of low-resource languages. The lack of speech and text data significantly hinders the development of speech technologies for such languages. This paper introduces an automatic speech recognition system for Livvi-Karelian. Acoustic models based on artificial neural networks with time delays and hidden Markov models were trained using a limited speech dataset of 3.5 h. To augment the data, pitch and speech rate perturbation, SpecAugment, and their combinations were employed. Language models based on 3-grams and neural networks were trained using written texts and transcripts. The achieved word error rate metric of 22.80% is comparable to other low-resource languages. To the best of our knowledge, this is the first speech recognition system for Livvi-Karelian. The results obtained can be of a certain significance for development of automatic speech recognition systems not only for Livvi-Karelian, but also for other low-resource languages, including the fields of speech recognition and machine translation systems. Future work includes experiments with Karelian data using techniques such as transfer learning and DNN language models. Text karelian MDPI Open Access Publishing Mathematics 11 18 3814
spellingShingle low-resource languages
automatic speech recognition
audio data augmentation
time delay neural network
hidden Markov models
long short-term memory
Irina Kipyatkova
Ildar Kagirov
Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case
title Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case
title_full Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case
title_fullStr Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case
title_full_unstemmed Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case
title_short Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case
title_sort deep models for low-resourced speech recognition: livvi-karelian case
topic low-resource languages
automatic speech recognition
audio data augmentation
time delay neural network
hidden Markov models
long short-term memory
topic_facet low-resource languages
automatic speech recognition
audio data augmentation
time delay neural network
hidden Markov models
long short-term memory
url https://doi.org/10.3390/math11183814