Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case

Recently, there has been a growth in the number of studies addressing the automatic processing of low-resource languages. The lack of speech and text data significantly hinders the development of speech technologies for such languages. This paper introduces an automatic speech recognition system for...

Full description

Bibliographic Details
Published in:	Mathematics
Main Authors:	Irina Kipyatkova, Ildar Kagirov
Format:	Article in Journal/Newspaper
Language:	English
Published:	MDPI AG 2023
Subjects:	low-resource languages automatic speech recognition audio data augmentation time delay neural network hidden Markov models long short-term memory Mathematics QA1-939 karelian
Online Access:	https://doi.org/10.3390/math11183814 https://doaj.org/article/d9e0c8bc649d4be88c9f93a1bca08ef9

id	ftdoajarticles:oai:doaj.org/article:d9e0c8bc649d4be88c9f93a1bca08ef9
record_format	openpolar
spelling	ftdoajarticles:oai:doaj.org/article:d9e0c8bc649d4be88c9f93a1bca08ef9 2023-10-29T02:37:35+01:00 Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case Irina Kipyatkova Ildar Kagirov 2023-09-01T00:00:00Z https://doi.org/10.3390/math11183814 https://doaj.org/article/d9e0c8bc649d4be88c9f93a1bca08ef9 EN eng MDPI AG https://www.mdpi.com/2227-7390/11/18/3814 https://doaj.org/toc/2227-7390 doi:10.3390/math11183814 2227-7390 https://doaj.org/article/d9e0c8bc649d4be88c9f93a1bca08ef9 Mathematics, Vol 11, Iss 3814, p 3814 (2023) low-resource languages automatic speech recognition audio data augmentation time delay neural network hidden Markov models long short-term memory Mathematics QA1-939 article 2023 ftdoajarticles https://doi.org/10.3390/math11183814 2023-10-01T00:37:42Z Recently, there has been a growth in the number of studies addressing the automatic processing of low-resource languages. The lack of speech and text data significantly hinders the development of speech technologies for such languages. This paper introduces an automatic speech recognition system for Livvi-Karelian. Acoustic models based on artificial neural networks with time delays and hidden Markov models were trained using a limited speech dataset of 3.5 h. To augment the data, pitch and speech rate perturbation, SpecAugment, and their combinations were employed. Language models based on 3-grams and neural networks were trained using written texts and transcripts. The achieved word error rate metric of 22.80% is comparable to other low-resource languages. To the best of our knowledge, this is the first speech recognition system for Livvi-Karelian. The results obtained can be of a certain significance for development of automatic speech recognition systems not only for Livvi-Karelian, but also for other low-resource languages, including the fields of speech recognition and machine translation systems. Future work includes experiments with Karelian data using techniques such as transfer learning and DNN language models. Article in Journal/Newspaper karelian Directory of Open Access Journals: DOAJ Articles Mathematics 11 18 3814
institution	Open Polar
collection	Directory of Open Access Journals: DOAJ Articles
op_collection_id	ftdoajarticles
language	English
topic	low-resource languages automatic speech recognition audio data augmentation time delay neural network hidden Markov models long short-term memory Mathematics QA1-939
spellingShingle	low-resource languages automatic speech recognition audio data augmentation time delay neural network hidden Markov models long short-term memory Mathematics QA1-939 Irina Kipyatkova Ildar Kagirov Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case
topic_facet	low-resource languages automatic speech recognition audio data augmentation time delay neural network hidden Markov models long short-term memory Mathematics QA1-939
description	Recently, there has been a growth in the number of studies addressing the automatic processing of low-resource languages. The lack of speech and text data significantly hinders the development of speech technologies for such languages. This paper introduces an automatic speech recognition system for Livvi-Karelian. Acoustic models based on artificial neural networks with time delays and hidden Markov models were trained using a limited speech dataset of 3.5 h. To augment the data, pitch and speech rate perturbation, SpecAugment, and their combinations were employed. Language models based on 3-grams and neural networks were trained using written texts and transcripts. The achieved word error rate metric of 22.80% is comparable to other low-resource languages. To the best of our knowledge, this is the first speech recognition system for Livvi-Karelian. The results obtained can be of a certain significance for development of automatic speech recognition systems not only for Livvi-Karelian, but also for other low-resource languages, including the fields of speech recognition and machine translation systems. Future work includes experiments with Karelian data using techniques such as transfer learning and DNN language models.
format	Article in Journal/Newspaper
author	Irina Kipyatkova Ildar Kagirov
author_facet	Irina Kipyatkova Ildar Kagirov
author_sort	Irina Kipyatkova
title	Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case
title_short	Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case
title_full	Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case
title_fullStr	Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case
title_full_unstemmed	Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case
title_sort	deep models for low-resourced speech recognition: livvi-karelian case
publisher	MDPI AG
publishDate	2023
url	https://doi.org/10.3390/math11183814 https://doaj.org/article/d9e0c8bc649d4be88c9f93a1bca08ef9
genre	karelian
genre_facet	karelian
op_source	Mathematics, Vol 11, Iss 3814, p 3814 (2023)
op_relation	https://www.mdpi.com/2227-7390/11/18/3814 https://doaj.org/toc/2227-7390 doi:10.3390/math11183814 2227-7390 https://doaj.org/article/d9e0c8bc649d4be88c9f93a1bca08ef9
op_doi	https://doi.org/10.3390/math11183814
container_title	Mathematics
container_volume	11
container_issue	18
container_start_page	3814
_version_	1781062518240706560

Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case

Similar Items