SautiDB: Nigerian Accent Dataset Collection

The SautiDB dataset collection projectis an ongoing effort to collect datasets of various Nigerian accents. The dataset was collected in an uncontrolled manner, users who visit our webapp can recordtheir voice and contribute to the dataset. The webapp uses the audio webapi to collect voice samples....

Full description

Bibliographic Details
Main Authors: Afonja, Tejumade, Mbataku, Clinton, Malomo, Ademola, Okubadejo, Olumide, Francis, Lawrence, Nwadike, Munachiso, Orife, Iroro
Format: Other/Unknown Material
Language:Old English
Published: Zenodo 2021
Subjects:
Online Access:https://doi.org/10.5281/zenodo.4561842
id ftzenodo:oai:zenodo.org:4561842
record_format openpolar
spelling ftzenodo:oai:zenodo.org:4561842 2024-09-09T19:26:49+00:00 SautiDB: Nigerian Accent Dataset Collection Afonja, Tejumade Mbataku, Clinton Malomo, Ademola Okubadejo, Olumide Francis, Lawrence Nwadike, Munachiso Orife, Iroro 2021-02-28 https://doi.org/10.5281/zenodo.4561842 ang ang Zenodo https://zenodo.org/communities/africanlp https://doi.org/10.5281/zenodo.4561841 https://doi.org/10.5281/zenodo.4561842 oai:zenodo.org:4561842 info:eu-repo/semantics/restrictedAccess accent nigerian_accent african_accent audio voice info:eu-repo/semantics/other 2021 ftzenodo https://doi.org/10.5281/zenodo.456184210.5281/zenodo.4561841 2024-07-26T15:12:52Z The SautiDB dataset collection projectis an ongoing effort to collect datasets of various Nigerian accents. The dataset was collected in an uncontrolled manner, users who visit our webapp can recordtheir voice and contribute to the dataset. The webapp uses the audio webapi to collect voice samples. We hope this dataset will be useful to people interested in developing voice technology in Nigeria. We will continuously collect more datasets and publish updated versions as we have them. This work grew out of our project Improving Online Experience using Accent Transfer . The filename is of the form nativeLanguage_fluentLanguage_speakerID_gender_sentenceID.wav, where nativeLanguage: language spoken by the speaker's tribe. Native (mother) language of the speaker fluentLanguage: language that the speaker feels best describes their accents speakerID: ID, assigned to the speaker. It is possible for a speaker to have multiple IDs assigned since we are not authenticating users, we simply cached their browser sessions. gender: gender of the speaker. We did not explicitly collect this information from users, wehand-labeled it. sentenceID: the sentence ID for the sentences read. We used the CMU Arctic sentences . =========================== Before Postprocessing =========================== Number of Samples: 1615 Size Webm: 59MB Size Wav: 847MB Sampling Rate: 48000Hz Total Time: 2hrs 30min 21sec ============================ After Postprocessing ============================ Number of Samples: 919 Size Wav: 336MB Sampling Rate: 48000Hz Total Time: 0hrs 59min 08sec The associated Github repository used for post-processing can also be found linked . We are grateful for funding from AI4D-IndabaX with IDRC Grant Number: 109187-002. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License . Other/Unknown Material Arctic Zenodo Arctic
institution Open Polar
collection Zenodo
op_collection_id ftzenodo
language Old English
topic accent
nigerian_accent
african_accent
audio
voice
spellingShingle accent
nigerian_accent
african_accent
audio
voice
Afonja, Tejumade
Mbataku, Clinton
Malomo, Ademola
Okubadejo, Olumide
Francis, Lawrence
Nwadike, Munachiso
Orife, Iroro
SautiDB: Nigerian Accent Dataset Collection
topic_facet accent
nigerian_accent
african_accent
audio
voice
description The SautiDB dataset collection projectis an ongoing effort to collect datasets of various Nigerian accents. The dataset was collected in an uncontrolled manner, users who visit our webapp can recordtheir voice and contribute to the dataset. The webapp uses the audio webapi to collect voice samples. We hope this dataset will be useful to people interested in developing voice technology in Nigeria. We will continuously collect more datasets and publish updated versions as we have them. This work grew out of our project Improving Online Experience using Accent Transfer . The filename is of the form nativeLanguage_fluentLanguage_speakerID_gender_sentenceID.wav, where nativeLanguage: language spoken by the speaker's tribe. Native (mother) language of the speaker fluentLanguage: language that the speaker feels best describes their accents speakerID: ID, assigned to the speaker. It is possible for a speaker to have multiple IDs assigned since we are not authenticating users, we simply cached their browser sessions. gender: gender of the speaker. We did not explicitly collect this information from users, wehand-labeled it. sentenceID: the sentence ID for the sentences read. We used the CMU Arctic sentences . =========================== Before Postprocessing =========================== Number of Samples: 1615 Size Webm: 59MB Size Wav: 847MB Sampling Rate: 48000Hz Total Time: 2hrs 30min 21sec ============================ After Postprocessing ============================ Number of Samples: 919 Size Wav: 336MB Sampling Rate: 48000Hz Total Time: 0hrs 59min 08sec The associated Github repository used for post-processing can also be found linked . We are grateful for funding from AI4D-IndabaX with IDRC Grant Number: 109187-002. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License .
format Other/Unknown Material
author Afonja, Tejumade
Mbataku, Clinton
Malomo, Ademola
Okubadejo, Olumide
Francis, Lawrence
Nwadike, Munachiso
Orife, Iroro
author_facet Afonja, Tejumade
Mbataku, Clinton
Malomo, Ademola
Okubadejo, Olumide
Francis, Lawrence
Nwadike, Munachiso
Orife, Iroro
author_sort Afonja, Tejumade
title SautiDB: Nigerian Accent Dataset Collection
title_short SautiDB: Nigerian Accent Dataset Collection
title_full SautiDB: Nigerian Accent Dataset Collection
title_fullStr SautiDB: Nigerian Accent Dataset Collection
title_full_unstemmed SautiDB: Nigerian Accent Dataset Collection
title_sort sautidb: nigerian accent dataset collection
publisher Zenodo
publishDate 2021
url https://doi.org/10.5281/zenodo.4561842
geographic Arctic
geographic_facet Arctic
genre Arctic
genre_facet Arctic
op_relation https://zenodo.org/communities/africanlp
https://doi.org/10.5281/zenodo.4561841
https://doi.org/10.5281/zenodo.4561842
oai:zenodo.org:4561842
op_rights info:eu-repo/semantics/restrictedAccess
op_doi https://doi.org/10.5281/zenodo.456184210.5281/zenodo.4561841
_version_ 1809896376548458496