Eesti murdekorpus : Estonian Dialect Corpus

korpus : The dialect corpus consists of: 1) Dialect recordings. The corpus is based on dialect recordings which have mainly been made in the 1960s and 1970s. The first recordings are even earlier – they date from 1938. The recordings are traditional dialect recordings where the interview is conducte...

Full description

Bibliographic Details
Main Author: Lindström, Liina
Format: Article in Journal/Newspaper
Language:unknown
Published: Center of Estonian Language Resources 2013
Subjects:
Online Access:https://dx.doi.org/10.15155/1-00-0000-0000-0000-00076l
https://metashare.ut.ee/repository/browse/602bfe185a5111e2a6e4005056b40024160ffef10451449ea07798e7c30b3081/
id ftdatacite:10.15155/1-00-0000-0000-0000-00076l
record_format openpolar
spelling ftdatacite:10.15155/1-00-0000-0000-0000-00076l 2023-05-15T18:42:57+02:00 Eesti murdekorpus : Estonian Dialect Corpus Lindström, Liina 2013 https://dx.doi.org/10.15155/1-00-0000-0000-0000-00076l https://metashare.ut.ee/repository/browse/602bfe185a5111e2a6e4005056b40024160ffef10451449ea07798e7c30b3081/ unknown Center of Estonian Language Resources article CreativeWork 2013 ftdatacite https://doi.org/10.15155/1-00-0000-0000-0000-00076l 2022-04-01T18:25:31Z korpus : The dialect corpus consists of: 1) Dialect recordings. The corpus is based on dialect recordings which have mainly been made in the 1960s and 1970s. The first recordings are even earlier – they date from 1938. The recordings are traditional dialect recordings where the interview is conducted at the home of the informant. 2) Phonetically transcribed texts. The traditional Finno-Ugric phonetic transcription is used. The texts are available as Word and pdf files (by the 1st of May 2011, there are about 1,284,000 text words in the corpus). 3) Dialect texts in simplified transcription. All of the phonetically transcribed texts have been transported one-to-one into the simplified transcription (.txt), which enables the use of these texts with every program and to conduct primary analyses. 4) Morphologically tagged texts which have been read into a MySQL database. All the word classes and morphological forms are tagged; 5) Database containing information about informants and recordings; 6) Syntactically parsed texts (about 40000 text words). In the corpus, every phonetically transcribed text is accompanied by a recording, a file in simplified transcription and a description; more than half of the texts are also accompanied by a morphologically tagged file. Also some data from other Finnic languages which are spoken around Estonia have been added. The aim is to incorporate at least Votic, Ingrian and Livonian data to the corpus. Article in Journal/Newspaper votic DataCite Metadata Store (German National Library of Science and Technology)
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language unknown
description korpus : The dialect corpus consists of: 1) Dialect recordings. The corpus is based on dialect recordings which have mainly been made in the 1960s and 1970s. The first recordings are even earlier – they date from 1938. The recordings are traditional dialect recordings where the interview is conducted at the home of the informant. 2) Phonetically transcribed texts. The traditional Finno-Ugric phonetic transcription is used. The texts are available as Word and pdf files (by the 1st of May 2011, there are about 1,284,000 text words in the corpus). 3) Dialect texts in simplified transcription. All of the phonetically transcribed texts have been transported one-to-one into the simplified transcription (.txt), which enables the use of these texts with every program and to conduct primary analyses. 4) Morphologically tagged texts which have been read into a MySQL database. All the word classes and morphological forms are tagged; 5) Database containing information about informants and recordings; 6) Syntactically parsed texts (about 40000 text words). In the corpus, every phonetically transcribed text is accompanied by a recording, a file in simplified transcription and a description; more than half of the texts are also accompanied by a morphologically tagged file. Also some data from other Finnic languages which are spoken around Estonia have been added. The aim is to incorporate at least Votic, Ingrian and Livonian data to the corpus.
format Article in Journal/Newspaper
author Lindström, Liina
spellingShingle Lindström, Liina
Eesti murdekorpus : Estonian Dialect Corpus
author_facet Lindström, Liina
author_sort Lindström, Liina
title Eesti murdekorpus : Estonian Dialect Corpus
title_short Eesti murdekorpus : Estonian Dialect Corpus
title_full Eesti murdekorpus : Estonian Dialect Corpus
title_fullStr Eesti murdekorpus : Estonian Dialect Corpus
title_full_unstemmed Eesti murdekorpus : Estonian Dialect Corpus
title_sort eesti murdekorpus : estonian dialect corpus
publisher Center of Estonian Language Resources
publishDate 2013
url https://dx.doi.org/10.15155/1-00-0000-0000-0000-00076l
https://metashare.ut.ee/repository/browse/602bfe185a5111e2a6e4005056b40024160ffef10451449ea07798e7c30b3081/
genre votic
genre_facet votic
op_doi https://doi.org/10.15155/1-00-0000-0000-0000-00076l
_version_ 1766232719601172480