D1 - The Research and Teaching Corpus of Spoken German (FOLK) and the Database for Spoken German (DGD)

Abstract: The main objective of the Research and Teaching Corpus of Spoken German (FOLK) is to provide a large, broadly diversified collection of audio and video recordings of spoken German in authentic, spontaneous interactions (Schmidt 2014a, b, 2017). The corpus contains private, institutional an...

Full description

Bibliographic Details
Main Authors: LIV Congresso SLI 2021, Kaiser, Julia
Format: Article in Journal/Newspaper
Language:unknown
Published: Underline Science Inc. 2021
Subjects:
Online Access:https://dx.doi.org/10.48448/64fe-a166
https://underline.io/lecture/33025-d1---the-research-and-teaching-corpus-of-spoken-german-(folk)-and-the-database-for-spoken-german-(dgd)
Description
Summary:Abstract: The main objective of the Research and Teaching Corpus of Spoken German (FOLK) is to provide a large, broadly diversified collection of audio and video recordings of spoken German in authentic, spontaneous interactions (Schmidt 2014a, b, 2017). The corpus contains private, institutional and public conversations of various domains of everyday social life and with speakers of diverse sex, age, regional provenance, education etc. It aims at ideally representing the “communicative household” (Luckmann 1988) of today’s German speaking society and therefore gains annually about 30 hours (interactions currently date from 2003 to 2020). The leading factor for the corpus design is the interaction type, which is represented as a feature bundle of interaction parameters such as interactional domain, area of life, main activities. Further features of the interactions which are also relevant as primary parameters for the corpus stratification are the medium (i.e., face-to-face, telephone, video call), the number of participants, their degree of familiarity etc. Demographic and socio-biographical features of the participants such as age, regional provenance, education level, profession etc. are systematically documented and represented as secondary stratification parameters (Kaiser 2018). As a reference corpus, FOLK is usable for many different research and teaching purposes. The online access via the registration at the Database for Spoken German is thus available for researchers, students and teachers and allows for: browsing of single transcriptions, audios and videos (fully aligned with transcriptions) and metadata, searching systematically for particular verbal phenomena in the speaker contributions via automatic queries on the different annotation levels, filtering, sampling, combining, quantifying, sharing, exporting query results, creating virtual sub-corpora with particular interaction and/or speaker features, downloading excerpts of transcripts and recordings locally or saving them online in the personal workspace for further analyses. Since the release in spring 2021, FOLK contains 374 interactions with a total duration of about 300 hours and almost 3 million transcribed tokens. The transcription follows the conventions of the “Gesprächsanalytisches Transkriptionssystem” (GAT) which is the commonly used guideline for transcribing interactions of spoken German, with an adaption to computerized transcription with an editor (cGAT, Schmidt et al. 2015). The annotation levels consist of orthographic normalization, lemmatization and a part-of-speech tagging with a tag set which has been specifically adapted to spoken language (Westpfahl/Schmidt 2016). A project for syntactic segmentations of speaker contributions is right now in the phase of stepwise implementation. References Kaiser, Julia (2018). ‘Zur Stratifikation des FOLK-Korpus: Konzeption und Strategien’, Gesprächsforschung - Online-Zeitschrift zur verbalen Interaktion 19: 515-552. http://www.gespraechsforschung-online.de/fileadmin/dateien/heft2018/px-kaiser.pdf. Luckmann, Thomas. 1988. ‘Kommunikative Gattungen im kommunikativen ""Haushalt” einer Gesellschaft’. In Smolka-Koerdt, Gisela / Spangenberg, Peter M. / Tillmann-Bartylla, Dagmar (ed.). Der Ursprung von Literatur. München: 279-288. Schmidt, Thomas (2014a). ‘Gesprächskorpora und Gesprächsdatenbanken am Beispiel von FOLK und DGD’, Gesprächsforschung - Online-Zeitschrift zur verbalen Interaktion 15: 196-233. http://www.gespraechsforschung-ozs.de/fileadmin/dateien/heft2014/px-schmidt.pdf. Schmidt, Thomas. 2014b. ‘The Research and Teaching Corpus of Spoken German – FOLK’. In Proceedings of the Ninth conference on International Language Resources and Evaluation (LREC’14). Reykjavik, Iceland: 383-387. Schmidt, Thomas (2017). ‘Construction and Dissemination of a Corpus of Spoken Interaction – Tools and Workflows in the FOLK project’. In Kupietz, Marc / Geyken, Alexander (ed.). Corpus Linguistic Software Tools, Journal for Language Technology and Computational Linguistics (JLCL 31/1): 127-154. Schmidt, Thomas / Schütte, Wilfried / Winterscheid, Jenny (2015). cGAT. Konventionen für das computergestützte Transkribieren in Anlehnung an das Gesprächsanalytische Transkriptionssystem 2 (GAT2). Working paper. https://ids-pub.bsz-bw.de/frontdoor/deliver/index/docId/4616/file/SchmidtSchuetteWinterscheidcGAT2015.pdf. Westpfahl, Swantje / Schmidt, Thomas. 2016. FOLK-Gold – A GOLD standard for Part-of-Speech-Tagging of Spoken German. In Proceedings of the Tenth Conference on International Language Resources and Evaluation (LREC’16). Portorož, Slovenia: 1493-1499.