SautiDB: Nigerian Accent Dataset Collection
The SautiDB dataset collection projectis an ongoing effort to collect datasets of various Nigerian accents. The dataset was collected in an uncontrolled manner, users who visit our webapp can recordtheir voice and contribute to the dataset. The webapp uses the audio webapi to collect voice samples....
Main Authors: | , , , , , , |
---|---|
Format: | Other/Unknown Material |
Language: | Old English |
Published: |
Zenodo
2023
|
Subjects: | |
Online Access: | https://doi.org/10.5281/zenodo.7646394 |
id |
ftzenodo:oai:zenodo.org:7646394 |
---|---|
record_format |
openpolar |
spelling |
ftzenodo:oai:zenodo.org:7646394 2024-09-09T19:27:16+00:00 SautiDB: Nigerian Accent Dataset Collection Afonja, Tejumade Mbataku, Clinton Malomo, Ademola Okubadejo, Olumide Francis, Lawrence Nwadike, Munachiso Orife, Iroro 2023-02-16 https://doi.org/10.5281/zenodo.7646394 ang ang Zenodo https://zenodo.org/communities/africanlp https://doi.org/10.5281/zenodo.4561841 https://doi.org/10.5281/zenodo.7646394 oai:zenodo.org:7646394 info:eu-repo/semantics/restrictedAccess accent nigerian_accent african_accent audio voice info:eu-repo/semantics/other 2023 ftzenodo https://doi.org/10.5281/zenodo.764639410.5281/zenodo.4561841 2024-07-25T22:27:42Z The SautiDB dataset collection projectis an ongoing effort to collect datasets of various Nigerian accents. The dataset was collected in an uncontrolled manner, users who visit our webapp can recordtheir voice and contribute to the dataset. The webapp uses the audio webapi to collect voice samples. We hope this dataset will be useful to people interested in developing voice technology in Nigeria. We will continuously collect more datasets and publish updated versions as we have them. This work grew out of our project Improving Online Experience using Accent Transfer . The filename is of the form nativeLanguage_fluentLanguage_speakerID_gender_sentenceID.wav, where nativeLanguage: language spoken by the speaker's tribe. Native (mother) language of the speaker fluentLanguage: language that the speaker feels best describes their accents speakerID: ID, assigned to the speaker. It is possible for a speaker to have multiple IDs assigned since we are not authenticating users, we simply cached their browser sessions. gender: gender of the speaker. We did not explicitly collect this information from users, wehand-labeled it. sentenceID: the sentence ID for the sentences read. We used the CMU Arctic sentences . =========================== Before Postprocessing =========================== Number of Samples: 1615 Size Webm: 59MB Size Wav: 847MB Sampling Rate: 48000Hz Total Time: 2hrs 30min 21sec ============================ After Postprocessing ============================ Number of Samples: 919 Size Wav: 336MB Sampling Rate: 48000Hz Total Time: 0hrs 59min 08sec ============================ Version 1.1 ============================ This version has two updates: 1. In version 1.0, the naming convention for each language was to space each language with an underscore and uppercased, e.g., "Efik Ibibio" -> "EFIK_IBIBIO".We have changed "EFIK_IBIBIO" -> "EFIKIBIBIO". i.e. the file name, which was previously 'EFIK_IBIBIO_EFIK_IBIBIO_0014_M_A0138.wav', has now been changed to 'EFIKIBIBIO_EFIKIBIBIO_0014_M_A0138.wav'.This ... Other/Unknown Material Arctic Zenodo Arctic |
institution |
Open Polar |
collection |
Zenodo |
op_collection_id |
ftzenodo |
language |
Old English |
topic |
accent nigerian_accent african_accent audio voice |
spellingShingle |
accent nigerian_accent african_accent audio voice Afonja, Tejumade Mbataku, Clinton Malomo, Ademola Okubadejo, Olumide Francis, Lawrence Nwadike, Munachiso Orife, Iroro SautiDB: Nigerian Accent Dataset Collection |
topic_facet |
accent nigerian_accent african_accent audio voice |
description |
The SautiDB dataset collection projectis an ongoing effort to collect datasets of various Nigerian accents. The dataset was collected in an uncontrolled manner, users who visit our webapp can recordtheir voice and contribute to the dataset. The webapp uses the audio webapi to collect voice samples. We hope this dataset will be useful to people interested in developing voice technology in Nigeria. We will continuously collect more datasets and publish updated versions as we have them. This work grew out of our project Improving Online Experience using Accent Transfer . The filename is of the form nativeLanguage_fluentLanguage_speakerID_gender_sentenceID.wav, where nativeLanguage: language spoken by the speaker's tribe. Native (mother) language of the speaker fluentLanguage: language that the speaker feels best describes their accents speakerID: ID, assigned to the speaker. It is possible for a speaker to have multiple IDs assigned since we are not authenticating users, we simply cached their browser sessions. gender: gender of the speaker. We did not explicitly collect this information from users, wehand-labeled it. sentenceID: the sentence ID for the sentences read. We used the CMU Arctic sentences . =========================== Before Postprocessing =========================== Number of Samples: 1615 Size Webm: 59MB Size Wav: 847MB Sampling Rate: 48000Hz Total Time: 2hrs 30min 21sec ============================ After Postprocessing ============================ Number of Samples: 919 Size Wav: 336MB Sampling Rate: 48000Hz Total Time: 0hrs 59min 08sec ============================ Version 1.1 ============================ This version has two updates: 1. In version 1.0, the naming convention for each language was to space each language with an underscore and uppercased, e.g., "Efik Ibibio" -> "EFIK_IBIBIO".We have changed "EFIK_IBIBIO" -> "EFIKIBIBIO". i.e. the file name, which was previously 'EFIK_IBIBIO_EFIK_IBIBIO_0014_M_A0138.wav', has now been changed to 'EFIKIBIBIO_EFIKIBIBIO_0014_M_A0138.wav'.This ... |
format |
Other/Unknown Material |
author |
Afonja, Tejumade Mbataku, Clinton Malomo, Ademola Okubadejo, Olumide Francis, Lawrence Nwadike, Munachiso Orife, Iroro |
author_facet |
Afonja, Tejumade Mbataku, Clinton Malomo, Ademola Okubadejo, Olumide Francis, Lawrence Nwadike, Munachiso Orife, Iroro |
author_sort |
Afonja, Tejumade |
title |
SautiDB: Nigerian Accent Dataset Collection |
title_short |
SautiDB: Nigerian Accent Dataset Collection |
title_full |
SautiDB: Nigerian Accent Dataset Collection |
title_fullStr |
SautiDB: Nigerian Accent Dataset Collection |
title_full_unstemmed |
SautiDB: Nigerian Accent Dataset Collection |
title_sort |
sautidb: nigerian accent dataset collection |
publisher |
Zenodo |
publishDate |
2023 |
url |
https://doi.org/10.5281/zenodo.7646394 |
geographic |
Arctic |
geographic_facet |
Arctic |
genre |
Arctic |
genre_facet |
Arctic |
op_relation |
https://zenodo.org/communities/africanlp https://doi.org/10.5281/zenodo.4561841 https://doi.org/10.5281/zenodo.7646394 oai:zenodo.org:7646394 |
op_rights |
info:eu-repo/semantics/restrictedAccess |
op_doi |
https://doi.org/10.5281/zenodo.764639410.5281/zenodo.4561841 |
_version_ |
1809896715982995456 |