End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning

As an indispensable module of computer-aided pronunciation training (CAPT) systems, mispronunciation detection and diagnosis (MDD) techniques have attracted a lot of attention from academia and industry over the past decade. To train robust MDD models, this technique requires massive human-annotated...

Full description

Bibliographic Details
Published in:	Applied Sciences
Main Authors:	Linkai Peng, Yingming Gao, Rian Bao, Ya Li, Jinsong Zhang
Format:	Text
Language:	English
Published:	Multidisciplinary Digital Publishing Institute 2023
Subjects:	mispronunciation detection and diagnosis (MDD) computer-aided pronunciation training (CAPT) transfer learning pretrained model text modulation gate Arctic
Online Access:	https://doi.org/10.3390/app13116793

id	ftmdpi:oai:mdpi.com:/2076-3417/13/11/6793/
record_format	openpolar
spelling	ftmdpi:oai:mdpi.com:/2076-3417/13/11/6793/ 2023-08-20T04:04:39+02:00 End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning Linkai Peng Yingming Gao Rian Bao Ya Li Jinsong Zhang agris 2023-06-02 application/pdf https://doi.org/10.3390/app13116793 EN eng Multidisciplinary Digital Publishing Institute Computing and Artificial Intelligence https://dx.doi.org/10.3390/app13116793 https://creativecommons.org/licenses/by/4.0/ Applied Sciences; Volume 13; Issue 11; Pages: 6793 mispronunciation detection and diagnosis (MDD) computer-aided pronunciation training (CAPT) transfer learning pretrained model text modulation gate Text 2023 ftmdpi https://doi.org/10.3390/app13116793 2023-08-01T10:20:46Z As an indispensable module of computer-aided pronunciation training (CAPT) systems, mispronunciation detection and diagnosis (MDD) techniques have attracted a lot of attention from academia and industry over the past decade. To train robust MDD models, this technique requires massive human-annotated speech recordings which are usually expensive and even hard to acquire. In this study, we propose to use transfer learning to tackle the problem of data scarcity from two aspects. First, from audio modality, we explore the use of the pretrained model wav2vec2.0 for MDD tasks by learning robust general acoustic representation. Second, from text modality, we explore transferring prior texts into MDD by learning associations between acoustic and textual modalities. We propose textual modulation gates that assign more importance to the relevant text information while suppressing irrelevant text information. Moreover, given the transcriptions, we propose an extra contrastive loss to reduce the difference of learning objectives between the phoneme recognition and MDD tasks. Conducting experiments on the L2-Arctic dataset showed that our wav2vec2.0 based models outperformed the conventional methods. The proposed textual modulation gate and contrastive loss further improved the F1-score by more than 2.88% and our best model achieved an F1-score of 61.75%. Text Arctic MDPI Open Access Publishing Arctic Applied Sciences 13 11 6793
institution	Open Polar
collection	MDPI Open Access Publishing
op_collection_id	ftmdpi
language	English
topic	mispronunciation detection and diagnosis (MDD) computer-aided pronunciation training (CAPT) transfer learning pretrained model text modulation gate
spellingShingle	mispronunciation detection and diagnosis (MDD) computer-aided pronunciation training (CAPT) transfer learning pretrained model text modulation gate Linkai Peng Yingming Gao Rian Bao Ya Li Jinsong Zhang End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
topic_facet	mispronunciation detection and diagnosis (MDD) computer-aided pronunciation training (CAPT) transfer learning pretrained model text modulation gate
description	As an indispensable module of computer-aided pronunciation training (CAPT) systems, mispronunciation detection and diagnosis (MDD) techniques have attracted a lot of attention from academia and industry over the past decade. To train robust MDD models, this technique requires massive human-annotated speech recordings which are usually expensive and even hard to acquire. In this study, we propose to use transfer learning to tackle the problem of data scarcity from two aspects. First, from audio modality, we explore the use of the pretrained model wav2vec2.0 for MDD tasks by learning robust general acoustic representation. Second, from text modality, we explore transferring prior texts into MDD by learning associations between acoustic and textual modalities. We propose textual modulation gates that assign more importance to the relevant text information while suppressing irrelevant text information. Moreover, given the transcriptions, we propose an extra contrastive loss to reduce the difference of learning objectives between the phoneme recognition and MDD tasks. Conducting experiments on the L2-Arctic dataset showed that our wav2vec2.0 based models outperformed the conventional methods. The proposed textual modulation gate and contrastive loss further improved the F1-score by more than 2.88% and our best model achieved an F1-score of 61.75%.
format	Text
author	Linkai Peng Yingming Gao Rian Bao Ya Li Jinsong Zhang
author_facet	Linkai Peng Yingming Gao Rian Bao Ya Li Jinsong Zhang
author_sort	Linkai Peng
title	End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
title_short	End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
title_full	End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
title_fullStr	End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
title_full_unstemmed	End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
title_sort	end-to-end mispronunciation detection and diagnosis using transfer learning
publisher	Multidisciplinary Digital Publishing Institute
publishDate	2023
url	https://doi.org/10.3390/app13116793
op_coverage	agris
geographic	Arctic
geographic_facet	Arctic
genre	Arctic
genre_facet	Arctic
op_source	Applied Sciences; Volume 13; Issue 11; Pages: 6793
op_relation	Computing and Artificial Intelligence https://dx.doi.org/10.3390/app13116793
op_rights	https://creativecommons.org/licenses/by/4.0/
op_doi	https://doi.org/10.3390/app13116793
container_title	Applied Sciences
container_volume	13
container_issue	11
container_start_page	6793
_version_	1774715028420689920

End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning

Similar Items