多編碼器端到端模型於英語錯誤發音檢測與診斷

隨著全球化的加速,大多數人需要學習第二語言(Second language, L2),相較之下,語言教師的人數增長卻無法跟上語言學習的需求。因此越來越多研究著重在電腦輔助發音訓練(Computer-assisted pronunciation training, CAPT),嘗試利用電腦輔助學習者做更方便且有效的學習。在 CAPT 中,最重要的模組為以自動語音辨識(Automatic speech recognition, ASR)為核心技術的錯誤發音和診斷(Mispronunciation detection and diagnosis, MD&D)。然而,現有 MD&D 模...

Full description

Bibliographic Details
Main Authors: 范姜紹瑋, Fan Jiang, Shao-Wei
Other Authors: 陳柏琳, Chen, Berlin
Format: Other/Unknown Material
Language:Chinese
Published: 2022
Subjects:
Online Access:http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/117312
https://hdl.handle.net/20.500.12235/117312
https://etds.lib.ntnu.edu.tw/thesis/detail/09fe9581520801a1c912217fadd1680d/
id ftntnormaluniv:oai:rportal.lib.ntnu.edu.tw:20.500.12235/117312
record_format openpolar
spelling ftntnormaluniv:oai:rportal.lib.ntnu.edu.tw:20.500.12235/117312 2023-06-11T04:10:02+02:00 多編碼器端到端模型於英語錯誤發音檢測與診斷 Multi-Encoder based End-to-End Model for English Mispronunciation Detection and Diagnosis 范姜紹瑋 Fan Jiang, Shao-Wei 陳柏琳 Chen, Berlin 2022-06-08T02:43:29Z application/pdf http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/117312 https://hdl.handle.net/20.500.12235/117312 https://etds.lib.ntnu.edu.tw/thesis/detail/09fe9581520801a1c912217fadd1680d/ 中文 chi 60847011S-39852 https://etds.lib.ntnu.edu.tw/thesis/detail/09fe9581520801a1c912217fadd1680d/ http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/117312 錯誤發音檢測和診斷 電腦輔助發音訓練 口音嵌入 多任務學習 端到端模型 mispronunciation detection and diagnosis computer-assisted pronunciation training accent embeddings multi-task learning End-to-End model 學術論文 2022 ftntnormaluniv https://doi.org/20.500.12235/117312 2023-04-23T09:14:01Z 隨著全球化的加速,大多數人需要學習第二語言(Second language, L2),相較之下,語言教師的人數增長卻無法跟上語言學習的需求。因此越來越多研究著重在電腦輔助發音訓練(Computer-assisted pronunciation training, CAPT),嘗試利用電腦輔助學習者做更方便且有效的學習。在 CAPT 中,最重要的模組為以自動語音辨識(Automatic speech recognition, ASR)為核心技術的錯誤發音和診斷(Mispronunciation detection and diagnosis, MD&D)。然而,現有 MD&D 模型仍面臨兩個問題:一、任務不匹配。純語音辨識任務並未充分利用提示文本(Text prompt)於訓練階段。二、口音多樣性。第二語言學習者具有特殊的發音習慣,該習慣的聲學或語言特性會導致模型效能辨識困難。基於上述兩個問題,本研究提出兩個解決方向於端對端 MD&D 模型 (End-to-end MD&D, E2E MD&D)。首先,我們使用不同細粒度(音素與字元)的文本提示進行輸入增強,使 E2E ASR 更適合 MD&D 任務。其次,我們設計兩種不同面向的口音感知模塊,提示模型口音資訊以及消除口音資訊,嘗試減輕口音多樣性於 E2E MD&D 系統的影響。實驗結果表明,在公開二語語料庫 L2-ARCTIC 上,我們提出 MD&D 模型具有明顯的優勢與有效性。 With the acceleration of globalization, most people need to learn a second language (L2). In contrast, the increase in the number of language teachers cannot keep up with the demand for language learning. Therefore, more and more researches focus on computer-assisted pronunciation training (CAPT), trying to use computers to assist learners do more convenient and effective learning. In CAPT, the most important module is mispronunciation detection and diagnosis (MD&D) with automatic speech recognition (ASR) as the core technology. However, the existing MD&D model still facing two problems. First, the task does not match. The pure ASR task does not make full use of the text prompt in the training phase. Second, diversity of accents. L2 learners have special pronunciation habits, and the acoustic or linguistic characteristics of this habit will make it difficult to identify the effectiveness of the model. Based on the above two problems, this research proposes two solutions to the end-to-end MD&D model (E2E MD&D). First, we use different fine-grained (phoneme and character) text prompts for input augmentation, making E2E ASR more suitable for MD&D tasks. Second, we designed two different accent perception modules, prompting model accent information and eliminating accent information, trying to reduce the impact of accent diversity on the E2E MD&D ... Other/Unknown Material Arctic National Taiwan Normal University: NTNU Institutional Repository Arctic
institution Open Polar
collection National Taiwan Normal University: NTNU Institutional Repository
op_collection_id ftntnormaluniv
language Chinese
topic 錯誤發音檢測和診斷
電腦輔助發音訓練
口音嵌入
多任務學習
端到端模型
mispronunciation detection and diagnosis
computer-assisted pronunciation training
accent embeddings
multi-task learning
End-to-End model
spellingShingle 錯誤發音檢測和診斷
電腦輔助發音訓練
口音嵌入
多任務學習
端到端模型
mispronunciation detection and diagnosis
computer-assisted pronunciation training
accent embeddings
multi-task learning
End-to-End model
范姜紹瑋
Fan Jiang, Shao-Wei
多編碼器端到端模型於英語錯誤發音檢測與診斷
topic_facet 錯誤發音檢測和診斷
電腦輔助發音訓練
口音嵌入
多任務學習
端到端模型
mispronunciation detection and diagnosis
computer-assisted pronunciation training
accent embeddings
multi-task learning
End-to-End model
description 隨著全球化的加速,大多數人需要學習第二語言(Second language, L2),相較之下,語言教師的人數增長卻無法跟上語言學習的需求。因此越來越多研究著重在電腦輔助發音訓練(Computer-assisted pronunciation training, CAPT),嘗試利用電腦輔助學習者做更方便且有效的學習。在 CAPT 中,最重要的模組為以自動語音辨識(Automatic speech recognition, ASR)為核心技術的錯誤發音和診斷(Mispronunciation detection and diagnosis, MD&D)。然而,現有 MD&D 模型仍面臨兩個問題:一、任務不匹配。純語音辨識任務並未充分利用提示文本(Text prompt)於訓練階段。二、口音多樣性。第二語言學習者具有特殊的發音習慣,該習慣的聲學或語言特性會導致模型效能辨識困難。基於上述兩個問題,本研究提出兩個解決方向於端對端 MD&D 模型 (End-to-end MD&D, E2E MD&D)。首先,我們使用不同細粒度(音素與字元)的文本提示進行輸入增強,使 E2E ASR 更適合 MD&D 任務。其次,我們設計兩種不同面向的口音感知模塊,提示模型口音資訊以及消除口音資訊,嘗試減輕口音多樣性於 E2E MD&D 系統的影響。實驗結果表明,在公開二語語料庫 L2-ARCTIC 上,我們提出 MD&D 模型具有明顯的優勢與有效性。 With the acceleration of globalization, most people need to learn a second language (L2). In contrast, the increase in the number of language teachers cannot keep up with the demand for language learning. Therefore, more and more researches focus on computer-assisted pronunciation training (CAPT), trying to use computers to assist learners do more convenient and effective learning. In CAPT, the most important module is mispronunciation detection and diagnosis (MD&D) with automatic speech recognition (ASR) as the core technology. However, the existing MD&D model still facing two problems. First, the task does not match. The pure ASR task does not make full use of the text prompt in the training phase. Second, diversity of accents. L2 learners have special pronunciation habits, and the acoustic or linguistic characteristics of this habit will make it difficult to identify the effectiveness of the model. Based on the above two problems, this research proposes two solutions to the end-to-end MD&D model (E2E MD&D). First, we use different fine-grained (phoneme and character) text prompts for input augmentation, making E2E ASR more suitable for MD&D tasks. Second, we designed two different accent perception modules, prompting model accent information and eliminating accent information, trying to reduce the impact of accent diversity on the E2E MD&D ...
author2 陳柏琳
Chen, Berlin
format Other/Unknown Material
author 范姜紹瑋
Fan Jiang, Shao-Wei
author_facet 范姜紹瑋
Fan Jiang, Shao-Wei
author_sort 范姜紹瑋
title 多編碼器端到端模型於英語錯誤發音檢測與診斷
title_short 多編碼器端到端模型於英語錯誤發音檢測與診斷
title_full 多編碼器端到端模型於英語錯誤發音檢測與診斷
title_fullStr 多編碼器端到端模型於英語錯誤發音檢測與診斷
title_full_unstemmed 多編碼器端到端模型於英語錯誤發音檢測與診斷
title_sort 多編碼器端到端模型於英語錯誤發音檢測與診斷
publishDate 2022
url http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/117312
https://hdl.handle.net/20.500.12235/117312
https://etds.lib.ntnu.edu.tw/thesis/detail/09fe9581520801a1c912217fadd1680d/
geographic Arctic
geographic_facet Arctic
genre Arctic
genre_facet Arctic
op_relation 60847011S-39852
https://etds.lib.ntnu.edu.tw/thesis/detail/09fe9581520801a1c912217fadd1680d/
http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/117312
op_doi https://doi.org/20.500.12235/117312
_version_ 1768384136814264320