Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning

To address the performance gap of English ASR models on L2 English speakers, we evaluate fine-tuning of pretrained wav2vec 2.0 models (Baevski et al., 2020; Xu et al., 2021) on L2-ARCTIC, a non-native English speech corpus (Zhao et al., 2018) under different training settings. We compare \textbf{(a)...

Full description

Bibliographic Details
Main Authors:	Shibano, Toshiko, Zhang, Xinyi, Li, Mia Taige, Cho, Haejin, Sullivan, Peter, Abdul-Mageed, Muhammad
Format:	Article in Journal/Newspaper
Language:	unknown
Published:	arXiv 2021
Subjects:	Audio and Speech Processing eess.AS Computation and Language cs.CL Machine Learning cs.LG FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences Arctic
Online Access:	https://dx.doi.org/10.48550/arxiv.2110.00678 https://arxiv.org/abs/2110.00678

id	ftdatacite:10.48550/arxiv.2110.00678
record_format	openpolar
spelling	ftdatacite:10.48550/arxiv.2110.00678 2023-05-15T15:04:38+02:00 Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning Shibano, Toshiko Zhang, Xinyi Li, Mia Taige Cho, Haejin Sullivan, Peter Abdul-Mageed, Muhammad 2021 https://dx.doi.org/10.48550/arxiv.2110.00678 https://arxiv.org/abs/2110.00678 unknown arXiv Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 CC-BY Audio and Speech Processing eess.AS Computation and Language cs.CL Machine Learning cs.LG FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences Article CreativeWork article Preprint 2021 ftdatacite https://doi.org/10.48550/arxiv.2110.00678 2022-03-10T13:50:04Z To address the performance gap of English ASR models on L2 English speakers, we evaluate fine-tuning of pretrained wav2vec 2.0 models (Baevski et al., 2020; Xu et al., 2021) on L2-ARCTIC, a non-native English speech corpus (Zhao et al., 2018) under different training settings. We compare \textbf{(a)} models trained with a combination of diverse accents to ones trained with only specific accents and \textbf{(b)} results from different single-accent models. Our experiments demonstrate the promise of developing ASR models for non-native English speakers, even with small amounts of L2 training data and even without a language model. Our models also excel in the zero-shot setting where we train on multiple L2 datasets and test on a blind L2 test set. : All authors contributed equally. Paper accepted to International Conference on Natural Language and Speech Processing 2021 (ICNLSP 2021) Article in Journal/Newspaper Arctic DataCite Metadata Store (German National Library of Science and Technology) Arctic
institution	Open Polar
collection	DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id	ftdatacite
language	unknown
topic	Audio and Speech Processing eess.AS Computation and Language cs.CL Machine Learning cs.LG FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences
spellingShingle	Audio and Speech Processing eess.AS Computation and Language cs.CL Machine Learning cs.LG FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences Shibano, Toshiko Zhang, Xinyi Li, Mia Taige Cho, Haejin Sullivan, Peter Abdul-Mageed, Muhammad Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning
topic_facet	Audio and Speech Processing eess.AS Computation and Language cs.CL Machine Learning cs.LG FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences
description	To address the performance gap of English ASR models on L2 English speakers, we evaluate fine-tuning of pretrained wav2vec 2.0 models (Baevski et al., 2020; Xu et al., 2021) on L2-ARCTIC, a non-native English speech corpus (Zhao et al., 2018) under different training settings. We compare \textbf{(a)} models trained with a combination of diverse accents to ones trained with only specific accents and \textbf{(b)} results from different single-accent models. Our experiments demonstrate the promise of developing ASR models for non-native English speakers, even with small amounts of L2 training data and even without a language model. Our models also excel in the zero-shot setting where we train on multiple L2 datasets and test on a blind L2 test set. : All authors contributed equally. Paper accepted to International Conference on Natural Language and Speech Processing 2021 (ICNLSP 2021)
format	Article in Journal/Newspaper
author	Shibano, Toshiko Zhang, Xinyi Li, Mia Taige Cho, Haejin Sullivan, Peter Abdul-Mageed, Muhammad
author_facet	Shibano, Toshiko Zhang, Xinyi Li, Mia Taige Cho, Haejin Sullivan, Peter Abdul-Mageed, Muhammad
author_sort	Shibano, Toshiko
title	Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning
title_short	Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning
title_full	Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning
title_fullStr	Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning
title_full_unstemmed	Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning
title_sort	speech technology for everyone: automatic speech recognition for non-native english with transfer learning
publisher	arXiv
publishDate	2021
url	https://dx.doi.org/10.48550/arxiv.2110.00678 https://arxiv.org/abs/2110.00678
geographic	Arctic
geographic_facet	Arctic
genre	Arctic
genre_facet	Arctic
op_rights	Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0
op_rightsnorm	CC-BY
op_doi	https://doi.org/10.48550/arxiv.2110.00678
_version_	1766336379664465920

Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning

Similar Items