Samrómur Icelandic Speech 1.0

*Introduction* Samrómur Icelandic Speech 1.0 was developed by the Language and Voice Lab, Reykjavik University in cooperation with Almannarómur, Center for Language Technology. The corpus contains 145 hours of Icelandic prompted speech from 8,392 speakers representing 100,000 utterances. This versio...

Full description

Bibliographic Details
Other Authors:	Mollberg, David, Jónsson, Ólafur Helgi, Þorsteinsdóttir, Sunneva, Guðmundsdóttir, Jóhanna Vigdís, Steingrimsson, Steinthor, Magnusdottir, Eydis Huld, Fong, Judy, Borsky, Michal, Gudnason, Jon
Format:	Text
Language:	Icelandic
Published:	Linguistic Data Consortium 2022
Subjects:	Iceland Reykjavik University
Online Access:	https://catalog.ldc.upenn.edu/LDC2022S05

Description
Summary:	Introduction Samrómur Icelandic Speech 1.0 was developed by the Language and Voice Lab, Reykjavik University in cooperation with Almannarómur, Center for Language Technology. The corpus contains 145 hours of Icelandic prompted speech from 8,392 speakers representing 100,000 utterances. This version 1.0 is equivalent to "Samrómur Icelandic Speech 21.05" as used by the Language Technology Programme for Icelandic 2019-2023. Data Speech data was collected between October 2019 and May 2021 using the Samrómur website which displayed prompts to participants. The prompts were mainly from The Icelandic Gigaword Corpus, which includes text from novels, news, plays, and from a list of location names in Iceland. Additional prompts were taken from the Icelandic Web of Science and others were created by combining a name followed by a question or a demand. Prompts and speaker metadata are included in the corpus. The audio data is divided into train, dev, and test sets and is presented as flac compressed, single channel, 16 kHz, 16-bit linear PCM. Samples Please view this audio sample (FLAC). Updates None at this time.

Samrómur Icelandic Speech 1.0

Similar Items