Eesti murdekorpus : Estonian Dialect Corpus

korpus : The dialect corpus consists of: 1) Dialect recordings. The corpus is based on dialect recordings which have mainly been made in the 1960s and 1970s. The first recordings are even earlier – they date from 1938. The recordings are traditional dialect recordings where the interview is conducte...

Full description

Bibliographic Details
Main Author:	Lindström, Liina
Format:	Article in Journal/Newspaper
Language:	unknown
Published:	Center of Estonian Language Resources 2013
Subjects:	votic
Online Access:	https://dx.doi.org/10.15155/1-00-0000-0000-0000-00076l https://metashare.ut.ee/repository/browse/602bfe185a5111e2a6e4005056b40024160ffef10451449ea07798e7c30b3081/

id	ftdatacite:10.15155/1-00-0000-0000-0000-00076l
record_format	openpolar
spelling	ftdatacite:10.15155/1-00-0000-0000-0000-00076l 2023-05-15T18:42:57+02:00 Eesti murdekorpus : Estonian Dialect Corpus Lindström, Liina 2013 https://dx.doi.org/10.15155/1-00-0000-0000-0000-00076l https://metashare.ut.ee/repository/browse/602bfe185a5111e2a6e4005056b40024160ffef10451449ea07798e7c30b3081/ unknown Center of Estonian Language Resources article CreativeWork 2013 ftdatacite https://doi.org/10.15155/1-00-0000-0000-0000-00076l 2022-04-01T18:25:31Z korpus : The dialect corpus consists of: 1) Dialect recordings. The corpus is based on dialect recordings which have mainly been made in the 1960s and 1970s. The first recordings are even earlier – they date from 1938. The recordings are traditional dialect recordings where the interview is conducted at the home of the informant. 2) Phonetically transcribed texts. The traditional Finno-Ugric phonetic transcription is used. The texts are available as Word and pdf files (by the 1st of May 2011, there are about 1,284,000 text words in the corpus). 3) Dialect texts in simplified transcription. All of the phonetically transcribed texts have been transported one-to-one into the simplified transcription (.txt), which enables the use of these texts with every program and to conduct primary analyses. 4) Morphologically tagged texts which have been read into a MySQL database. All the word classes and morphological forms are tagged; 5) Database containing information about informants and recordings; 6) Syntactically parsed texts (about 40000 text words). In the corpus, every phonetically transcribed text is accompanied by a recording, a file in simplified transcription and a description; more than half of the texts are also accompanied by a morphologically tagged file. Also some data from other Finnic languages which are spoken around Estonia have been added. The aim is to incorporate at least Votic, Ingrian and Livonian data to the corpus. Article in Journal/Newspaper votic DataCite Metadata Store (German National Library of Science and Technology)
institution	Open Polar
collection	DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id	ftdatacite
language	unknown
description	korpus : The dialect corpus consists of: 1) Dialect recordings. The corpus is based on dialect recordings which have mainly been made in the 1960s and 1970s. The first recordings are even earlier – they date from 1938. The recordings are traditional dialect recordings where the interview is conducted at the home of the informant. 2) Phonetically transcribed texts. The traditional Finno-Ugric phonetic transcription is used. The texts are available as Word and pdf files (by the 1st of May 2011, there are about 1,284,000 text words in the corpus). 3) Dialect texts in simplified transcription. All of the phonetically transcribed texts have been transported one-to-one into the simplified transcription (.txt), which enables the use of these texts with every program and to conduct primary analyses. 4) Morphologically tagged texts which have been read into a MySQL database. All the word classes and morphological forms are tagged; 5) Database containing information about informants and recordings; 6) Syntactically parsed texts (about 40000 text words). In the corpus, every phonetically transcribed text is accompanied by a recording, a file in simplified transcription and a description; more than half of the texts are also accompanied by a morphologically tagged file. Also some data from other Finnic languages which are spoken around Estonia have been added. The aim is to incorporate at least Votic, Ingrian and Livonian data to the corpus.
format	Article in Journal/Newspaper
author	Lindström, Liina
spellingShingle	Lindström, Liina Eesti murdekorpus : Estonian Dialect Corpus
author_facet	Lindström, Liina
author_sort	Lindström, Liina
title	Eesti murdekorpus : Estonian Dialect Corpus
title_short	Eesti murdekorpus : Estonian Dialect Corpus
title_full	Eesti murdekorpus : Estonian Dialect Corpus
title_fullStr	Eesti murdekorpus : Estonian Dialect Corpus
title_full_unstemmed	Eesti murdekorpus : Estonian Dialect Corpus
title_sort	eesti murdekorpus : estonian dialect corpus
publisher	Center of Estonian Language Resources
publishDate	2013
url	https://dx.doi.org/10.15155/1-00-0000-0000-0000-00076l https://metashare.ut.ee/repository/browse/602bfe185a5111e2a6e4005056b40024160ffef10451449ea07798e7c30b3081/
genre	votic
genre_facet	votic
op_doi	https://doi.org/10.15155/1-00-0000-0000-0000-00076l
_version_	1766232719601172480

Eesti murdekorpus : Estonian Dialect Corpus

Similar Items