Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case

Recently, there has been a growth in the number of studies addressing the automatic processing of low-resource languages. The lack of speech and text data significantly hinders the development of speech technologies for such languages. This paper introduces an automatic speech recognition system for...

Full description

Bibliographic Details
Published in:Mathematics
Main Authors: Irina Kipyatkova, Ildar Kagirov
Format: Article in Journal/Newspaper
Language:English
Published: MDPI AG 2023
Subjects:
Online Access:https://doi.org/10.3390/math11183814
https://doaj.org/article/d9e0c8bc649d4be88c9f93a1bca08ef9
Description
Summary:Recently, there has been a growth in the number of studies addressing the automatic processing of low-resource languages. The lack of speech and text data significantly hinders the development of speech technologies for such languages. This paper introduces an automatic speech recognition system for Livvi-Karelian. Acoustic models based on artificial neural networks with time delays and hidden Markov models were trained using a limited speech dataset of 3.5 h. To augment the data, pitch and speech rate perturbation, SpecAugment, and their combinations were employed. Language models based on 3-grams and neural networks were trained using written texts and transcripts. The achieved word error rate metric of 22.80% is comparable to other low-resource languages. To the best of our knowledge, this is the first speech recognition system for Livvi-Karelian. The results obtained can be of a certain significance for development of automatic speech recognition systems not only for Livvi-Karelian, but also for other low-resource languages, including the fields of speech recognition and machine translation systems. Future work includes experiments with Karelian data using techniques such as transfer learning and DNN language models.