Eesti murdekorpus : Estonian Dialect Corpus
korpus : The dialect corpus consists of: 1) Dialect recordings. The corpus is based on dialect recordings which have mainly been made in the 1960s and 1970s. The first recordings are even earlier – they date from 1938. The recordings are traditional dialect recordings where the interview is conducte...
Main Author: | |
---|---|
Format: | Article in Journal/Newspaper |
Language: | unknown |
Published: |
Center of Estonian Language Resources
2013
|
Subjects: | |
Online Access: | https://dx.doi.org/10.15155/1-00-0000-0000-0000-00076l https://metashare.ut.ee/repository/browse/602bfe185a5111e2a6e4005056b40024160ffef10451449ea07798e7c30b3081/ |
id |
ftdatacite:10.15155/1-00-0000-0000-0000-00076l |
---|---|
record_format |
openpolar |
spelling |
ftdatacite:10.15155/1-00-0000-0000-0000-00076l 2023-05-15T18:42:57+02:00 Eesti murdekorpus : Estonian Dialect Corpus Lindström, Liina 2013 https://dx.doi.org/10.15155/1-00-0000-0000-0000-00076l https://metashare.ut.ee/repository/browse/602bfe185a5111e2a6e4005056b40024160ffef10451449ea07798e7c30b3081/ unknown Center of Estonian Language Resources article CreativeWork 2013 ftdatacite https://doi.org/10.15155/1-00-0000-0000-0000-00076l 2022-04-01T18:25:31Z korpus : The dialect corpus consists of: 1) Dialect recordings. The corpus is based on dialect recordings which have mainly been made in the 1960s and 1970s. The first recordings are even earlier – they date from 1938. The recordings are traditional dialect recordings where the interview is conducted at the home of the informant. 2) Phonetically transcribed texts. The traditional Finno-Ugric phonetic transcription is used. The texts are available as Word and pdf files (by the 1st of May 2011, there are about 1,284,000 text words in the corpus). 3) Dialect texts in simplified transcription. All of the phonetically transcribed texts have been transported one-to-one into the simplified transcription (.txt), which enables the use of these texts with every program and to conduct primary analyses. 4) Morphologically tagged texts which have been read into a MySQL database. All the word classes and morphological forms are tagged; 5) Database containing information about informants and recordings; 6) Syntactically parsed texts (about 40000 text words). In the corpus, every phonetically transcribed text is accompanied by a recording, a file in simplified transcription and a description; more than half of the texts are also accompanied by a morphologically tagged file. Also some data from other Finnic languages which are spoken around Estonia have been added. The aim is to incorporate at least Votic, Ingrian and Livonian data to the corpus. Article in Journal/Newspaper votic DataCite Metadata Store (German National Library of Science and Technology) |
institution |
Open Polar |
collection |
DataCite Metadata Store (German National Library of Science and Technology) |
op_collection_id |
ftdatacite |
language |
unknown |
description |
korpus : The dialect corpus consists of: 1) Dialect recordings. The corpus is based on dialect recordings which have mainly been made in the 1960s and 1970s. The first recordings are even earlier – they date from 1938. The recordings are traditional dialect recordings where the interview is conducted at the home of the informant. 2) Phonetically transcribed texts. The traditional Finno-Ugric phonetic transcription is used. The texts are available as Word and pdf files (by the 1st of May 2011, there are about 1,284,000 text words in the corpus). 3) Dialect texts in simplified transcription. All of the phonetically transcribed texts have been transported one-to-one into the simplified transcription (.txt), which enables the use of these texts with every program and to conduct primary analyses. 4) Morphologically tagged texts which have been read into a MySQL database. All the word classes and morphological forms are tagged; 5) Database containing information about informants and recordings; 6) Syntactically parsed texts (about 40000 text words). In the corpus, every phonetically transcribed text is accompanied by a recording, a file in simplified transcription and a description; more than half of the texts are also accompanied by a morphologically tagged file. Also some data from other Finnic languages which are spoken around Estonia have been added. The aim is to incorporate at least Votic, Ingrian and Livonian data to the corpus. |
format |
Article in Journal/Newspaper |
author |
Lindström, Liina |
spellingShingle |
Lindström, Liina Eesti murdekorpus : Estonian Dialect Corpus |
author_facet |
Lindström, Liina |
author_sort |
Lindström, Liina |
title |
Eesti murdekorpus : Estonian Dialect Corpus |
title_short |
Eesti murdekorpus : Estonian Dialect Corpus |
title_full |
Eesti murdekorpus : Estonian Dialect Corpus |
title_fullStr |
Eesti murdekorpus : Estonian Dialect Corpus |
title_full_unstemmed |
Eesti murdekorpus : Estonian Dialect Corpus |
title_sort |
eesti murdekorpus : estonian dialect corpus |
publisher |
Center of Estonian Language Resources |
publishDate |
2013 |
url |
https://dx.doi.org/10.15155/1-00-0000-0000-0000-00076l https://metashare.ut.ee/repository/browse/602bfe185a5111e2a6e4005056b40024160ffef10451449ea07798e7c30b3081/ |
genre |
votic |
genre_facet |
votic |
op_doi |
https://doi.org/10.15155/1-00-0000-0000-0000-00076l |
_version_ |
1766232719601172480 |