End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning

As an indispensable module of computer-aided pronunciation training (CAPT) systems, mispronunciation detection and diagnosis (MDD) techniques have attracted a lot of attention from academia and industry over the past decade. To train robust MDD models, this technique requires massive human-annotated...

Full description

Bibliographic Details
Published in:Applied Sciences
Main Authors: Linkai Peng, Yingming Gao, Rian Bao, Ya Li, Jinsong Zhang
Format: Article in Journal/Newspaper
Language:English
Published: MDPI AG 2023
Subjects:
T
Online Access:https://doi.org/10.3390/app13116793
https://doaj.org/article/8a65852fffec4f2ba8a8235923ddd845
id ftdoajarticles:oai:doaj.org/article:8a65852fffec4f2ba8a8235923ddd845
record_format openpolar
spelling ftdoajarticles:oai:doaj.org/article:8a65852fffec4f2ba8a8235923ddd845 2023-07-02T03:31:30+02:00 End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning Linkai Peng Yingming Gao Rian Bao Ya Li Jinsong Zhang 2023-06-01T00:00:00Z https://doi.org/10.3390/app13116793 https://doaj.org/article/8a65852fffec4f2ba8a8235923ddd845 EN eng MDPI AG https://www.mdpi.com/2076-3417/13/11/6793 https://doaj.org/toc/2076-3417 doi:10.3390/app13116793 2076-3417 https://doaj.org/article/8a65852fffec4f2ba8a8235923ddd845 Applied Sciences, Vol 13, Iss 6793, p 6793 (2023) mispronunciation detection and diagnosis (MDD) computer-aided pronunciation training (CAPT) transfer learning pretrained model text modulation gate Technology T Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999 article 2023 ftdoajarticles https://doi.org/10.3390/app13116793 2023-06-11T00:33:56Z As an indispensable module of computer-aided pronunciation training (CAPT) systems, mispronunciation detection and diagnosis (MDD) techniques have attracted a lot of attention from academia and industry over the past decade. To train robust MDD models, this technique requires massive human-annotated speech recordings which are usually expensive and even hard to acquire. In this study, we propose to use transfer learning to tackle the problem of data scarcity from two aspects. First, from audio modality, we explore the use of the pretrained model wav2vec2.0 for MDD tasks by learning robust general acoustic representation. Second, from text modality, we explore transferring prior texts into MDD by learning associations between acoustic and textual modalities. We propose textual modulation gates that assign more importance to the relevant text information while suppressing irrelevant text information. Moreover, given the transcriptions, we propose an extra contrastive loss to reduce the difference of learning objectives between the phoneme recognition and MDD tasks. Conducting experiments on the L2-Arctic dataset showed that our wav2vec2.0 based models outperformed the conventional methods. The proposed textual modulation gate and contrastive loss further improved the F1-score by more than 2.88% and our best model achieved an F1-score of 61.75%. Article in Journal/Newspaper Arctic Directory of Open Access Journals: DOAJ Articles Arctic Applied Sciences 13 11 6793
institution Open Polar
collection Directory of Open Access Journals: DOAJ Articles
op_collection_id ftdoajarticles
language English
topic mispronunciation detection and diagnosis (MDD)
computer-aided pronunciation training (CAPT)
transfer learning
pretrained model
text modulation gate
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
spellingShingle mispronunciation detection and diagnosis (MDD)
computer-aided pronunciation training (CAPT)
transfer learning
pretrained model
text modulation gate
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
Linkai Peng
Yingming Gao
Rian Bao
Ya Li
Jinsong Zhang
End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
topic_facet mispronunciation detection and diagnosis (MDD)
computer-aided pronunciation training (CAPT)
transfer learning
pretrained model
text modulation gate
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
description As an indispensable module of computer-aided pronunciation training (CAPT) systems, mispronunciation detection and diagnosis (MDD) techniques have attracted a lot of attention from academia and industry over the past decade. To train robust MDD models, this technique requires massive human-annotated speech recordings which are usually expensive and even hard to acquire. In this study, we propose to use transfer learning to tackle the problem of data scarcity from two aspects. First, from audio modality, we explore the use of the pretrained model wav2vec2.0 for MDD tasks by learning robust general acoustic representation. Second, from text modality, we explore transferring prior texts into MDD by learning associations between acoustic and textual modalities. We propose textual modulation gates that assign more importance to the relevant text information while suppressing irrelevant text information. Moreover, given the transcriptions, we propose an extra contrastive loss to reduce the difference of learning objectives between the phoneme recognition and MDD tasks. Conducting experiments on the L2-Arctic dataset showed that our wav2vec2.0 based models outperformed the conventional methods. The proposed textual modulation gate and contrastive loss further improved the F1-score by more than 2.88% and our best model achieved an F1-score of 61.75%.
format Article in Journal/Newspaper
author Linkai Peng
Yingming Gao
Rian Bao
Ya Li
Jinsong Zhang
author_facet Linkai Peng
Yingming Gao
Rian Bao
Ya Li
Jinsong Zhang
author_sort Linkai Peng
title End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
title_short End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
title_full End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
title_fullStr End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
title_full_unstemmed End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
title_sort end-to-end mispronunciation detection and diagnosis using transfer learning
publisher MDPI AG
publishDate 2023
url https://doi.org/10.3390/app13116793
https://doaj.org/article/8a65852fffec4f2ba8a8235923ddd845
geographic Arctic
geographic_facet Arctic
genre Arctic
genre_facet Arctic
op_source Applied Sciences, Vol 13, Iss 6793, p 6793 (2023)
op_relation https://www.mdpi.com/2076-3417/13/11/6793
https://doaj.org/toc/2076-3417
doi:10.3390/app13116793
2076-3417
https://doaj.org/article/8a65852fffec4f2ba8a8235923ddd845
op_doi https://doi.org/10.3390/app13116793
container_title Applied Sciences
container_volume 13
container_issue 11
container_start_page 6793
_version_ 1770270881800519680