End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
As an indispensable module of computer-aided pronunciation training (CAPT) systems, mispronunciation detection and diagnosis (MDD) techniques have attracted a lot of attention from academia and industry over the past decade. To train robust MDD models, this technique requires massive human-annotated...
Published in: | Applied Sciences |
---|---|
Main Authors: | , , , , |
Format: | Text |
Language: | English |
Published: |
Multidisciplinary Digital Publishing Institute
2023
|
Subjects: | |
Online Access: | https://doi.org/10.3390/app13116793 |
id |
ftmdpi:oai:mdpi.com:/2076-3417/13/11/6793/ |
---|---|
record_format |
openpolar |
spelling |
ftmdpi:oai:mdpi.com:/2076-3417/13/11/6793/ 2023-08-20T04:04:39+02:00 End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning Linkai Peng Yingming Gao Rian Bao Ya Li Jinsong Zhang agris 2023-06-02 application/pdf https://doi.org/10.3390/app13116793 EN eng Multidisciplinary Digital Publishing Institute Computing and Artificial Intelligence https://dx.doi.org/10.3390/app13116793 https://creativecommons.org/licenses/by/4.0/ Applied Sciences; Volume 13; Issue 11; Pages: 6793 mispronunciation detection and diagnosis (MDD) computer-aided pronunciation training (CAPT) transfer learning pretrained model text modulation gate Text 2023 ftmdpi https://doi.org/10.3390/app13116793 2023-08-01T10:20:46Z As an indispensable module of computer-aided pronunciation training (CAPT) systems, mispronunciation detection and diagnosis (MDD) techniques have attracted a lot of attention from academia and industry over the past decade. To train robust MDD models, this technique requires massive human-annotated speech recordings which are usually expensive and even hard to acquire. In this study, we propose to use transfer learning to tackle the problem of data scarcity from two aspects. First, from audio modality, we explore the use of the pretrained model wav2vec2.0 for MDD tasks by learning robust general acoustic representation. Second, from text modality, we explore transferring prior texts into MDD by learning associations between acoustic and textual modalities. We propose textual modulation gates that assign more importance to the relevant text information while suppressing irrelevant text information. Moreover, given the transcriptions, we propose an extra contrastive loss to reduce the difference of learning objectives between the phoneme recognition and MDD tasks. Conducting experiments on the L2-Arctic dataset showed that our wav2vec2.0 based models outperformed the conventional methods. The proposed textual modulation gate and contrastive loss further improved the F1-score by more than 2.88% and our best model achieved an F1-score of 61.75%. Text Arctic MDPI Open Access Publishing Arctic Applied Sciences 13 11 6793 |
institution |
Open Polar |
collection |
MDPI Open Access Publishing |
op_collection_id |
ftmdpi |
language |
English |
topic |
mispronunciation detection and diagnosis (MDD) computer-aided pronunciation training (CAPT) transfer learning pretrained model text modulation gate |
spellingShingle |
mispronunciation detection and diagnosis (MDD) computer-aided pronunciation training (CAPT) transfer learning pretrained model text modulation gate Linkai Peng Yingming Gao Rian Bao Ya Li Jinsong Zhang End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning |
topic_facet |
mispronunciation detection and diagnosis (MDD) computer-aided pronunciation training (CAPT) transfer learning pretrained model text modulation gate |
description |
As an indispensable module of computer-aided pronunciation training (CAPT) systems, mispronunciation detection and diagnosis (MDD) techniques have attracted a lot of attention from academia and industry over the past decade. To train robust MDD models, this technique requires massive human-annotated speech recordings which are usually expensive and even hard to acquire. In this study, we propose to use transfer learning to tackle the problem of data scarcity from two aspects. First, from audio modality, we explore the use of the pretrained model wav2vec2.0 for MDD tasks by learning robust general acoustic representation. Second, from text modality, we explore transferring prior texts into MDD by learning associations between acoustic and textual modalities. We propose textual modulation gates that assign more importance to the relevant text information while suppressing irrelevant text information. Moreover, given the transcriptions, we propose an extra contrastive loss to reduce the difference of learning objectives between the phoneme recognition and MDD tasks. Conducting experiments on the L2-Arctic dataset showed that our wav2vec2.0 based models outperformed the conventional methods. The proposed textual modulation gate and contrastive loss further improved the F1-score by more than 2.88% and our best model achieved an F1-score of 61.75%. |
format |
Text |
author |
Linkai Peng Yingming Gao Rian Bao Ya Li Jinsong Zhang |
author_facet |
Linkai Peng Yingming Gao Rian Bao Ya Li Jinsong Zhang |
author_sort |
Linkai Peng |
title |
End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning |
title_short |
End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning |
title_full |
End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning |
title_fullStr |
End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning |
title_full_unstemmed |
End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning |
title_sort |
end-to-end mispronunciation detection and diagnosis using transfer learning |
publisher |
Multidisciplinary Digital Publishing Institute |
publishDate |
2023 |
url |
https://doi.org/10.3390/app13116793 |
op_coverage |
agris |
geographic |
Arctic |
geographic_facet |
Arctic |
genre |
Arctic |
genre_facet |
Arctic |
op_source |
Applied Sciences; Volume 13; Issue 11; Pages: 6793 |
op_relation |
Computing and Artificial Intelligence https://dx.doi.org/10.3390/app13116793 |
op_rights |
https://creativecommons.org/licenses/by/4.0/ |
op_doi |
https://doi.org/10.3390/app13116793 |
container_title |
Applied Sciences |
container_volume |
13 |
container_issue |
11 |
container_start_page |
6793 |
_version_ |
1774715028420689920 |