Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning

To address the performance gap of English ASR models on L2 English speakers, we evaluate fine-tuning of pretrained wav2vec 2.0 models (Baevski et al., 2020; Xu et al., 2021) on L2-ARCTIC, a non-native English speech corpus (Zhao et al., 2018) under different training settings. We compare \textbf{(a)...

Full description

Bibliographic Details
Main Authors: Shibano, Toshiko, Zhang, Xinyi, Li, Mia Taige, Cho, Haejin, Sullivan, Peter, Abdul-Mageed, Muhammad
Format: Article in Journal/Newspaper
Language:unknown
Published: arXiv 2021
Subjects:
Online Access:https://dx.doi.org/10.48550/arxiv.2110.00678
https://arxiv.org/abs/2110.00678
id ftdatacite:10.48550/arxiv.2110.00678
record_format openpolar
spelling ftdatacite:10.48550/arxiv.2110.00678 2023-05-15T15:04:38+02:00 Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning Shibano, Toshiko Zhang, Xinyi Li, Mia Taige Cho, Haejin Sullivan, Peter Abdul-Mageed, Muhammad 2021 https://dx.doi.org/10.48550/arxiv.2110.00678 https://arxiv.org/abs/2110.00678 unknown arXiv Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 CC-BY Audio and Speech Processing eess.AS Computation and Language cs.CL Machine Learning cs.LG FOS Electrical engineering, electronic engineering, information engineering FOS Computer and information sciences Article CreativeWork article Preprint 2021 ftdatacite https://doi.org/10.48550/arxiv.2110.00678 2022-03-10T13:50:04Z To address the performance gap of English ASR models on L2 English speakers, we evaluate fine-tuning of pretrained wav2vec 2.0 models (Baevski et al., 2020; Xu et al., 2021) on L2-ARCTIC, a non-native English speech corpus (Zhao et al., 2018) under different training settings. We compare \textbf{(a)} models trained with a combination of diverse accents to ones trained with only specific accents and \textbf{(b)} results from different single-accent models. Our experiments demonstrate the promise of developing ASR models for non-native English speakers, even with small amounts of L2 training data and even without a language model. Our models also excel in the zero-shot setting where we train on multiple L2 datasets and test on a blind L2 test set. : All authors contributed equally. Paper accepted to International Conference on Natural Language and Speech Processing 2021 (ICNLSP 2021) Article in Journal/Newspaper Arctic DataCite Metadata Store (German National Library of Science and Technology) Arctic
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language unknown
topic Audio and Speech Processing eess.AS
Computation and Language cs.CL
Machine Learning cs.LG
FOS Electrical engineering, electronic engineering, information engineering
FOS Computer and information sciences
spellingShingle Audio and Speech Processing eess.AS
Computation and Language cs.CL
Machine Learning cs.LG
FOS Electrical engineering, electronic engineering, information engineering
FOS Computer and information sciences
Shibano, Toshiko
Zhang, Xinyi
Li, Mia Taige
Cho, Haejin
Sullivan, Peter
Abdul-Mageed, Muhammad
Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning
topic_facet Audio and Speech Processing eess.AS
Computation and Language cs.CL
Machine Learning cs.LG
FOS Electrical engineering, electronic engineering, information engineering
FOS Computer and information sciences
description To address the performance gap of English ASR models on L2 English speakers, we evaluate fine-tuning of pretrained wav2vec 2.0 models (Baevski et al., 2020; Xu et al., 2021) on L2-ARCTIC, a non-native English speech corpus (Zhao et al., 2018) under different training settings. We compare \textbf{(a)} models trained with a combination of diverse accents to ones trained with only specific accents and \textbf{(b)} results from different single-accent models. Our experiments demonstrate the promise of developing ASR models for non-native English speakers, even with small amounts of L2 training data and even without a language model. Our models also excel in the zero-shot setting where we train on multiple L2 datasets and test on a blind L2 test set. : All authors contributed equally. Paper accepted to International Conference on Natural Language and Speech Processing 2021 (ICNLSP 2021)
format Article in Journal/Newspaper
author Shibano, Toshiko
Zhang, Xinyi
Li, Mia Taige
Cho, Haejin
Sullivan, Peter
Abdul-Mageed, Muhammad
author_facet Shibano, Toshiko
Zhang, Xinyi
Li, Mia Taige
Cho, Haejin
Sullivan, Peter
Abdul-Mageed, Muhammad
author_sort Shibano, Toshiko
title Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning
title_short Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning
title_full Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning
title_fullStr Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning
title_full_unstemmed Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning
title_sort speech technology for everyone: automatic speech recognition for non-native english with transfer learning
publisher arXiv
publishDate 2021
url https://dx.doi.org/10.48550/arxiv.2110.00678
https://arxiv.org/abs/2110.00678
geographic Arctic
geographic_facet Arctic
genre Arctic
genre_facet Arctic
op_rights Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
cc-by-4.0
op_rightsnorm CC-BY
op_doi https://doi.org/10.48550/arxiv.2110.00678
_version_ 1766336379664465920