SautiDB: Nigerian Accent Dataset Collection

The SautiDB dataset collection project (https://sautidb.web.app/) is an ongoing effort to collect datasets of various Nigerian accents. The filename has the form nativeLanguage_fluentLanguage_speakerID_gender_sentenceID.wav where nativeLanguage: Language spoken by the speaker's tribe. Native (m...

Full description

Bibliographic Details
Main Authors: Afonja, Tejumade (10211667), Orife, Iroro (10211670), Mbataku, Clinton (10211673), Malomo, Ademola (10211676), Okubadejo, Olumide (10211679), Francis, Lawrence (10211682), Nwadike, Munachiso (10211685)
Format: Dataset
Language:unknown
Published: 2021
Subjects:
Online Access:https://doi.org/10.5281/zenodo.4561842
id ftsmithonian:oai:figshare.com:article/14134137
record_format openpolar
spelling ftsmithonian:oai:figshare.com:article/14134137 2023-05-15T15:05:13+02:00 SautiDB: Nigerian Accent Dataset Collection Afonja, Tejumade (10211667) Orife, Iroro (10211670) Mbataku, Clinton (10211673) Malomo, Ademola (10211676) Okubadejo, Olumide (10211679) Francis, Lawrence (10211682) Nwadike, Munachiso (10211685) 2021-02-28T00:00:00Z https://doi.org/10.5281/zenodo.4561842 unknown https://figshare.com/articles/dataset/SautiDB_Nigerian_Accent_Dataset_Collection/14134137 doi:10.5281/zenodo.4561842 CC BY 4.0 CC-BY Microbiology Biotechnology Evolutionary Biology Ecology Science Policy Environmental Sciences not elsewhere classified Biological Sciences not elsewhere classified accent nigerian_accent african_accent Dataset 2021 ftsmithonian https://doi.org/10.5281/zenodo.4561842 2021-03-23T17:47:50Z The SautiDB dataset collection project (https://sautidb.web.app/) is an ongoing effort to collect datasets of various Nigerian accents. The filename has the form nativeLanguage_fluentLanguage_speakerID_gender_sentenceID.wav where nativeLanguage: Language spoken by the speaker's tribe. Native (mother) language of the speaker fluentLanguage: Language that the speaker thinks best describe their accents speakerID: ID assigned to speakers. Kindly note that it's possible that one speaker maps multiple IDs. This is because the speakers are not required to login to the system, we simply cache their browser session to recognize their subsequent uploads gender: Gender of the speaker. Kindly note that we did not collect this information from the speakers, we hand-labelled this. sentenceID: The sentence ID for the read sentences. We used the CMU Arctic sentences and can be found on http://www.festvox.org/cmu_arctic/cmuarctic.data =========================== Before Preprocessing =========================== Number of Samples: 1615 Size Webm: 59MB Size Wav: 847MB Sampling Rate: 48000Hz Total Time: 2hrs 30min 21sec ============================ After Preprocessing ============================ Number of Samples: 919 Size Wav: 336MB Sampling Rate: 48000Hz Total Time: 0hrs 59min 08sec Dataset Arctic Unknown Arctic
institution Open Polar
collection Unknown
op_collection_id ftsmithonian
language unknown
topic Microbiology
Biotechnology
Evolutionary Biology
Ecology
Science Policy
Environmental Sciences not elsewhere classified
Biological Sciences not elsewhere classified
accent
nigerian_accent
african_accent
spellingShingle Microbiology
Biotechnology
Evolutionary Biology
Ecology
Science Policy
Environmental Sciences not elsewhere classified
Biological Sciences not elsewhere classified
accent
nigerian_accent
african_accent
Afonja, Tejumade (10211667)
Orife, Iroro (10211670)
Mbataku, Clinton (10211673)
Malomo, Ademola (10211676)
Okubadejo, Olumide (10211679)
Francis, Lawrence (10211682)
Nwadike, Munachiso (10211685)
SautiDB: Nigerian Accent Dataset Collection
topic_facet Microbiology
Biotechnology
Evolutionary Biology
Ecology
Science Policy
Environmental Sciences not elsewhere classified
Biological Sciences not elsewhere classified
accent
nigerian_accent
african_accent
description The SautiDB dataset collection project (https://sautidb.web.app/) is an ongoing effort to collect datasets of various Nigerian accents. The filename has the form nativeLanguage_fluentLanguage_speakerID_gender_sentenceID.wav where nativeLanguage: Language spoken by the speaker's tribe. Native (mother) language of the speaker fluentLanguage: Language that the speaker thinks best describe their accents speakerID: ID assigned to speakers. Kindly note that it's possible that one speaker maps multiple IDs. This is because the speakers are not required to login to the system, we simply cache their browser session to recognize their subsequent uploads gender: Gender of the speaker. Kindly note that we did not collect this information from the speakers, we hand-labelled this. sentenceID: The sentence ID for the read sentences. We used the CMU Arctic sentences and can be found on http://www.festvox.org/cmu_arctic/cmuarctic.data =========================== Before Preprocessing =========================== Number of Samples: 1615 Size Webm: 59MB Size Wav: 847MB Sampling Rate: 48000Hz Total Time: 2hrs 30min 21sec ============================ After Preprocessing ============================ Number of Samples: 919 Size Wav: 336MB Sampling Rate: 48000Hz Total Time: 0hrs 59min 08sec
format Dataset
author Afonja, Tejumade (10211667)
Orife, Iroro (10211670)
Mbataku, Clinton (10211673)
Malomo, Ademola (10211676)
Okubadejo, Olumide (10211679)
Francis, Lawrence (10211682)
Nwadike, Munachiso (10211685)
author_facet Afonja, Tejumade (10211667)
Orife, Iroro (10211670)
Mbataku, Clinton (10211673)
Malomo, Ademola (10211676)
Okubadejo, Olumide (10211679)
Francis, Lawrence (10211682)
Nwadike, Munachiso (10211685)
author_sort Afonja, Tejumade (10211667)
title SautiDB: Nigerian Accent Dataset Collection
title_short SautiDB: Nigerian Accent Dataset Collection
title_full SautiDB: Nigerian Accent Dataset Collection
title_fullStr SautiDB: Nigerian Accent Dataset Collection
title_full_unstemmed SautiDB: Nigerian Accent Dataset Collection
title_sort sautidb: nigerian accent dataset collection
publishDate 2021
url https://doi.org/10.5281/zenodo.4561842
geographic Arctic
geographic_facet Arctic
genre Arctic
genre_facet Arctic
op_relation https://figshare.com/articles/dataset/SautiDB_Nigerian_Accent_Dataset_Collection/14134137
doi:10.5281/zenodo.4561842
op_rights CC BY 4.0
op_rightsnorm CC-BY
op_doi https://doi.org/10.5281/zenodo.4561842
_version_ 1766336961236172800