Maximum F1-score training for end-to-end mispronunciation detection and diagnosis of L2 English speech
End-to-end (E2E) neural models are increasingly attracting attention as a promising modeling approach for mispronunciation detection and diagnosis (MDD). Typically, these models are trained by optimizing a cross-entropy criterion, which corresponds to improving the log-likelihood of the training dat...
Main Authors: | , , , |
---|---|
Format: | Text |
Language: | unknown |
Published: |
2021
|
Subjects: | |
Online Access: | http://arxiv.org/abs/2108.13816 |
id |
ftarxivpreprints:oai:arXiv.org:2108.13816 |
---|---|
record_format |
openpolar |
spelling |
ftarxivpreprints:oai:arXiv.org:2108.13816 2023-09-05T13:17:25+02:00 Maximum F1-score training for end-to-end mispronunciation detection and diagnosis of L2 English speech Yan, Bi-Cheng Jiang, Shao-Wei Fan Chao, Fu-An Chen, Berlin 2021-08-31 http://arxiv.org/abs/2108.13816 unknown http://arxiv.org/abs/2108.13816 Electrical Engineering and Systems Science - Audio and Speech Processing text 2021 ftarxivpreprints 2023-08-16T16:39:34Z End-to-end (E2E) neural models are increasingly attracting attention as a promising modeling approach for mispronunciation detection and diagnosis (MDD). Typically, these models are trained by optimizing a cross-entropy criterion, which corresponds to improving the log-likelihood of the training data. However, there is a discrepancy between the objectives of model training and the MDD evaluation, since the performance of an MDD model is commonly evaluated in terms of F1-score instead of phone or word error rate (PER/WER). In view of this, we in this paper explore the use of a discriminative objective function for training E2E MDD models, which aims to maximize the expected F1-score directly. A series of experiments conducted on the L2-ARCTIC dataset show that our proposed method can yield considerable performance improvements in relation to some state-of-the-art E2E MDD approaches and the celebrated GOP method. Comment: Accepted by IEEE International Conference on Multimedia and Expo (ICME 2022) Text Arctic ArXiv.org (Cornell University Library) Arctic |
institution |
Open Polar |
collection |
ArXiv.org (Cornell University Library) |
op_collection_id |
ftarxivpreprints |
language |
unknown |
topic |
Electrical Engineering and Systems Science - Audio and Speech Processing |
spellingShingle |
Electrical Engineering and Systems Science - Audio and Speech Processing Yan, Bi-Cheng Jiang, Shao-Wei Fan Chao, Fu-An Chen, Berlin Maximum F1-score training for end-to-end mispronunciation detection and diagnosis of L2 English speech |
topic_facet |
Electrical Engineering and Systems Science - Audio and Speech Processing |
description |
End-to-end (E2E) neural models are increasingly attracting attention as a promising modeling approach for mispronunciation detection and diagnosis (MDD). Typically, these models are trained by optimizing a cross-entropy criterion, which corresponds to improving the log-likelihood of the training data. However, there is a discrepancy between the objectives of model training and the MDD evaluation, since the performance of an MDD model is commonly evaluated in terms of F1-score instead of phone or word error rate (PER/WER). In view of this, we in this paper explore the use of a discriminative objective function for training E2E MDD models, which aims to maximize the expected F1-score directly. A series of experiments conducted on the L2-ARCTIC dataset show that our proposed method can yield considerable performance improvements in relation to some state-of-the-art E2E MDD approaches and the celebrated GOP method. Comment: Accepted by IEEE International Conference on Multimedia and Expo (ICME 2022) |
format |
Text |
author |
Yan, Bi-Cheng Jiang, Shao-Wei Fan Chao, Fu-An Chen, Berlin |
author_facet |
Yan, Bi-Cheng Jiang, Shao-Wei Fan Chao, Fu-An Chen, Berlin |
author_sort |
Yan, Bi-Cheng |
title |
Maximum F1-score training for end-to-end mispronunciation detection and diagnosis of L2 English speech |
title_short |
Maximum F1-score training for end-to-end mispronunciation detection and diagnosis of L2 English speech |
title_full |
Maximum F1-score training for end-to-end mispronunciation detection and diagnosis of L2 English speech |
title_fullStr |
Maximum F1-score training for end-to-end mispronunciation detection and diagnosis of L2 English speech |
title_full_unstemmed |
Maximum F1-score training for end-to-end mispronunciation detection and diagnosis of L2 English speech |
title_sort |
maximum f1-score training for end-to-end mispronunciation detection and diagnosis of l2 english speech |
publishDate |
2021 |
url |
http://arxiv.org/abs/2108.13816 |
geographic |
Arctic |
geographic_facet |
Arctic |
genre |
Arctic |
genre_facet |
Arctic |
op_relation |
http://arxiv.org/abs/2108.13816 |
_version_ |
1776198598826917888 |