An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling

Mispronunciation detection and diagnosis (MDD) is a core component of computer-assisted pronunciation training (CAPT). Most of the existing MDD approaches focus on dealing with categorical errors (viz. one canonical phone is substituted by another one, aside from those mispronunciations caused by de...

Full description

Bibliographic Details
Main Authors:	Yan, Bi-Cheng, Wu, Meng-Che, Hung, Hsiao-Tsung, Chen, Berlin
Format:	Article in Journal/Newspaper
Language:	unknown
Published:	arXiv 2020
Subjects:	Audio and Speech Processing eess.AS Computation and Language cs.CL Sound cs.SD FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences Arctic
Online Access:	https://dx.doi.org/10.48550/arxiv.2005.11950 https://arxiv.org/abs/2005.11950

id	ftdatacite:10.48550/arxiv.2005.11950
record_format	openpolar
spelling	ftdatacite:10.48550/arxiv.2005.11950 2023-05-15T15:08:26+02:00 An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling Yan, Bi-Cheng Wu, Meng-Che Hung, Hsiao-Tsung Chen, Berlin 2020 https://dx.doi.org/10.48550/arxiv.2005.11950 https://arxiv.org/abs/2005.11950 unknown arXiv arXiv.org perpetual, non-exclusive license http://arxiv.org/licenses/nonexclusive-distrib/1.0/ Audio and Speech Processing eess.AS Computation and Language cs.CL Sound cs.SD FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences Article CreativeWork article Preprint 2020 ftdatacite https://doi.org/10.48550/arxiv.2005.11950 2022-03-10T15:52:33Z Mispronunciation detection and diagnosis (MDD) is a core component of computer-assisted pronunciation training (CAPT). Most of the existing MDD approaches focus on dealing with categorical errors (viz. one canonical phone is substituted by another one, aside from those mispronunciations caused by deletions or insertions). However, accurate detection and diagnosis of non-categorial or distortion errors (viz. approximating L2 phones with L1 (first-language) phones, or erroneous pronunciations in between) still seems out of reach. In view of this, we propose to conduct MDD with a novel end- to-end automatic speech recognition (E2E-based ASR) approach. In particular, we expand the original L2 phone set with their corresponding anti-phone set, making the E2E-based MDD approach have a better capability to take in both categorical and non-categorial mispronunciations, aiming to provide better mispronunciation detection and diagnosis feedback. Furthermore, a novel transfer-learning paradigm is devised to obtain the initial model estimate of the E2E-based MDD system without resource to any phonological rules. Extensive sets of experimental results on the L2-ARCTIC dataset show that our best system can outperform the existing E2E baseline system and pronunciation scoring based method (GOP) in terms of the F1-score, by 11.05% and 27.71%, respectively. : Accepted by Interspeech2020 Article in Journal/Newspaper Arctic DataCite Metadata Store (German National Library of Science and Technology) Arctic
institution	Open Polar
collection	DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id	ftdatacite
language	unknown
topic	Audio and Speech Processing eess.AS Computation and Language cs.CL Sound cs.SD FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences
spellingShingle	Audio and Speech Processing eess.AS Computation and Language cs.CL Sound cs.SD FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences Yan, Bi-Cheng Wu, Meng-Che Hung, Hsiao-Tsung Chen, Berlin An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling
topic_facet	Audio and Speech Processing eess.AS Computation and Language cs.CL Sound cs.SD FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences
description	Mispronunciation detection and diagnosis (MDD) is a core component of computer-assisted pronunciation training (CAPT). Most of the existing MDD approaches focus on dealing with categorical errors (viz. one canonical phone is substituted by another one, aside from those mispronunciations caused by deletions or insertions). However, accurate detection and diagnosis of non-categorial or distortion errors (viz. approximating L2 phones with L1 (first-language) phones, or erroneous pronunciations in between) still seems out of reach. In view of this, we propose to conduct MDD with a novel end- to-end automatic speech recognition (E2E-based ASR) approach. In particular, we expand the original L2 phone set with their corresponding anti-phone set, making the E2E-based MDD approach have a better capability to take in both categorical and non-categorial mispronunciations, aiming to provide better mispronunciation detection and diagnosis feedback. Furthermore, a novel transfer-learning paradigm is devised to obtain the initial model estimate of the E2E-based MDD system without resource to any phonological rules. Extensive sets of experimental results on the L2-ARCTIC dataset show that our best system can outperform the existing E2E baseline system and pronunciation scoring based method (GOP) in terms of the F1-score, by 11.05% and 27.71%, respectively. : Accepted by Interspeech2020
format	Article in Journal/Newspaper
author	Yan, Bi-Cheng Wu, Meng-Che Hung, Hsiao-Tsung Chen, Berlin
author_facet	Yan, Bi-Cheng Wu, Meng-Che Hung, Hsiao-Tsung Chen, Berlin
author_sort	Yan, Bi-Cheng
title	An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling
title_short	An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling
title_full	An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling
title_fullStr	An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling
title_full_unstemmed	An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling
title_sort	end-to-end mispronunciation detection system for l2 english speech leveraging novel anti-phone modeling
publisher	arXiv
publishDate	2020
url	https://dx.doi.org/10.48550/arxiv.2005.11950 https://arxiv.org/abs/2005.11950
geographic	Arctic
geographic_facet	Arctic
genre	Arctic
genre_facet	Arctic
op_rights	arXiv.org perpetual, non-exclusive license http://arxiv.org/licenses/nonexclusive-distrib/1.0/
op_doi	https://doi.org/10.48550/arxiv.2005.11950
_version_	1766339795993231360

An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling

Similar Items