Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning
To address the performance gap of English ASR models on L2 English speakers, we evaluate fine-tuning of pretrained wav2vec 2.0 models (Baevski et al., 2020; Xu et al., 2021) on L2-ARCTIC, a non-native English speech corpus (Zhao et al., 2018) under different training settings. We compare \textbf{(a)...
Main Authors: | , , , , , |
---|---|
Format: | Article in Journal/Newspaper |
Language: | unknown |
Published: |
arXiv
2021
|
Subjects: | |
Online Access: | https://dx.doi.org/10.48550/arxiv.2110.00678 https://arxiv.org/abs/2110.00678 |
id |
ftdatacite:10.48550/arxiv.2110.00678 |
---|---|
record_format |
openpolar |
spelling |
ftdatacite:10.48550/arxiv.2110.00678 2023-05-15T15:04:38+02:00 Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning Shibano, Toshiko Zhang, Xinyi Li, Mia Taige Cho, Haejin Sullivan, Peter Abdul-Mageed, Muhammad 2021 https://dx.doi.org/10.48550/arxiv.2110.00678 https://arxiv.org/abs/2110.00678 unknown arXiv Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 CC-BY Audio and Speech Processing eess.AS Computation and Language cs.CL Machine Learning cs.LG FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences Article CreativeWork article Preprint 2021 ftdatacite https://doi.org/10.48550/arxiv.2110.00678 2022-03-10T13:50:04Z To address the performance gap of English ASR models on L2 English speakers, we evaluate fine-tuning of pretrained wav2vec 2.0 models (Baevski et al., 2020; Xu et al., 2021) on L2-ARCTIC, a non-native English speech corpus (Zhao et al., 2018) under different training settings. We compare \textbf{(a)} models trained with a combination of diverse accents to ones trained with only specific accents and \textbf{(b)} results from different single-accent models. Our experiments demonstrate the promise of developing ASR models for non-native English speakers, even with small amounts of L2 training data and even without a language model. Our models also excel in the zero-shot setting where we train on multiple L2 datasets and test on a blind L2 test set. : All authors contributed equally. Paper accepted to International Conference on Natural Language and Speech Processing 2021 (ICNLSP 2021) Article in Journal/Newspaper Arctic DataCite Metadata Store (German National Library of Science and Technology) Arctic |
institution |
Open Polar |
collection |
DataCite Metadata Store (German National Library of Science and Technology) |
op_collection_id |
ftdatacite |
language |
unknown |
topic |
Audio and Speech Processing eess.AS Computation and Language cs.CL Machine Learning cs.LG FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences |
spellingShingle |
Audio and Speech Processing eess.AS Computation and Language cs.CL Machine Learning cs.LG FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences Shibano, Toshiko Zhang, Xinyi Li, Mia Taige Cho, Haejin Sullivan, Peter Abdul-Mageed, Muhammad Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning |
topic_facet |
Audio and Speech Processing eess.AS Computation and Language cs.CL Machine Learning cs.LG FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences |
description |
To address the performance gap of English ASR models on L2 English speakers, we evaluate fine-tuning of pretrained wav2vec 2.0 models (Baevski et al., 2020; Xu et al., 2021) on L2-ARCTIC, a non-native English speech corpus (Zhao et al., 2018) under different training settings. We compare \textbf{(a)} models trained with a combination of diverse accents to ones trained with only specific accents and \textbf{(b)} results from different single-accent models. Our experiments demonstrate the promise of developing ASR models for non-native English speakers, even with small amounts of L2 training data and even without a language model. Our models also excel in the zero-shot setting where we train on multiple L2 datasets and test on a blind L2 test set. : All authors contributed equally. Paper accepted to International Conference on Natural Language and Speech Processing 2021 (ICNLSP 2021) |
format |
Article in Journal/Newspaper |
author |
Shibano, Toshiko Zhang, Xinyi Li, Mia Taige Cho, Haejin Sullivan, Peter Abdul-Mageed, Muhammad |
author_facet |
Shibano, Toshiko Zhang, Xinyi Li, Mia Taige Cho, Haejin Sullivan, Peter Abdul-Mageed, Muhammad |
author_sort |
Shibano, Toshiko |
title |
Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning |
title_short |
Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning |
title_full |
Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning |
title_fullStr |
Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning |
title_full_unstemmed |
Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning |
title_sort |
speech technology for everyone: automatic speech recognition for non-native english with transfer learning |
publisher |
arXiv |
publishDate |
2021 |
url |
https://dx.doi.org/10.48550/arxiv.2110.00678 https://arxiv.org/abs/2110.00678 |
geographic |
Arctic |
geographic_facet |
Arctic |
genre |
Arctic |
genre_facet |
Arctic |
op_rights |
Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 |
op_rightsnorm |
CC-BY |
op_doi |
https://doi.org/10.48550/arxiv.2110.00678 |
_version_ |
1766336379664465920 |