Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process

International audience We present the design of a corpus of native and non-native speech for the language pair French-German, with a special emphasis on phonetic and prosodic aspects. To our knowledge there is no suitable corpus, in terms of size and coverage, currently available for the target lang...

Full description

Bibliographic Details
Main Authors: Fauth, Camille, Bonneau, Anne, Zimmerer, Frank, Trouvain, Jürgen, Andreeva, Bistra, Colotte, Vincent, Fohr, Dominique, Jouvet, Denis, Jügler, Jeanin, Laprie, Yves, Mella, Odile, Möbius, Bernd
Other Authors: Analysis, perception and recognition of speech (PAROLE), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Allgemeine Linguistik Computational Linguistics and phonetics (Allgemeine Linguistik), Saarland University Saarbrücken, The European Language Resources Association, ANR-12-FRAL-0007,IFCASL,apprentissage des langues assisté par ordinateur(2012)
Format: Other/Unknown Material
Language:English
Published: HAL CCSD 2014
Subjects:
Online Access:https://hal.inria.fr/hal-00979026/file/LREC_IFCASL_long.pdf
https://hal.inria.fr/hal-00979026
id fttriple:oai:gotriple.eu:10670/1.40wfin
record_format openpolar
spelling fttriple:oai:gotriple.eu:10670/1.40wfin 2023-05-15T16:50:43+02:00 Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process Fauth, Camille Bonneau, Anne Zimmerer, Frank Trouvain, Jürgen Andreeva, Bistra Colotte, Vincent Fohr, Dominique Jouvet, Denis Jügler, Jeanin Laprie, Yves Mella, Odile Möbius, Bernd Analysis, perception and recognition of speech (PAROLE) Inria Nancy - Grand Est Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD) Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA) Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA) Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS) Allgemeine Linguistik Computational Linguistics and phonetics (Allgemeine Linguistik) Saarland University Saarbrücken The European Language Resources Association ANR-12-FRAL-0007,IFCASL,apprentissage des langues assisté par ordinateur(2012) Reykjavik, Iceland 2014-05-26 https://hal.inria.fr/hal-00979026/file/LREC_IFCASL_long.pdf https://hal.inria.fr/hal-00979026 en eng HAL CCSD hal-00979026 10670/1.40wfin https://hal.inria.fr/hal-00979026/file/LREC_IFCASL_long.pdf https://hal.inria.fr/hal-00979026 other Hyper Article en Ligne - Sciences de l'Homme et de la Société LREC - 9th Language Resources and Evaluation Conference LREC - 9th Language Resources and Evaluation Conference, The European Language Resources Association, May 2014, Reykjavik, Iceland speech corpus phonetics language learning lang litt Conference Output https://vocabularies.coar-repositories.org/resource_types/c_c94f/ 2014 fttriple 2023-01-22T18:02:52Z International audience We present the design of a corpus of native and non-native speech for the language pair French-German, with a special emphasis on phonetic and prosodic aspects. To our knowledge there is no suitable corpus, in terms of size and coverage, currently available for the target language pair. To select the target L1-L2 interference phenomena we prepare a small preliminary corpus (corpus1), which is analyzed for coverage and cross-checked jointly by French and German experts. Based on this analysis, target phenomena on the phonetic and phonological level are selected on the basis of the expected degree of deviation from the native performance and the frequency of occurrence. 14 speakers performed both L2 (either French or German) and L1 material (either German or French). This allowed us to test, recordings duration, recordings material, the performance of our automatic aligner software. Then, we built corpus2 taking into account what we learned about corpus1. The aims are the same but we adapted speech material to avoid too long recording sessions. 100 speakers will be recorded. The corpus (corpus1 and corpus2) will be prepared as a searchable database, available for the scientific community after completion of the project. Other/Unknown Material Iceland Unknown
institution Open Polar
collection Unknown
op_collection_id fttriple
language English
topic speech corpus
phonetics
language learning
lang
litt
spellingShingle speech corpus
phonetics
language learning
lang
litt
Fauth, Camille
Bonneau, Anne
Zimmerer, Frank
Trouvain, Jürgen
Andreeva, Bistra
Colotte, Vincent
Fohr, Dominique
Jouvet, Denis
Jügler, Jeanin
Laprie, Yves
Mella, Odile
Möbius, Bernd
Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process
topic_facet speech corpus
phonetics
language learning
lang
litt
description International audience We present the design of a corpus of native and non-native speech for the language pair French-German, with a special emphasis on phonetic and prosodic aspects. To our knowledge there is no suitable corpus, in terms of size and coverage, currently available for the target language pair. To select the target L1-L2 interference phenomena we prepare a small preliminary corpus (corpus1), which is analyzed for coverage and cross-checked jointly by French and German experts. Based on this analysis, target phenomena on the phonetic and phonological level are selected on the basis of the expected degree of deviation from the native performance and the frequency of occurrence. 14 speakers performed both L2 (either French or German) and L1 material (either German or French). This allowed us to test, recordings duration, recordings material, the performance of our automatic aligner software. Then, we built corpus2 taking into account what we learned about corpus1. The aims are the same but we adapted speech material to avoid too long recording sessions. 100 speakers will be recorded. The corpus (corpus1 and corpus2) will be prepared as a searchable database, available for the scientific community after completion of the project.
author2 Analysis, perception and recognition of speech (PAROLE)
Inria Nancy - Grand Est
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD)
Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Allgemeine Linguistik Computational Linguistics and phonetics (Allgemeine Linguistik)
Saarland University Saarbrücken
The European Language Resources Association
ANR-12-FRAL-0007,IFCASL,apprentissage des langues assisté par ordinateur(2012)
format Other/Unknown Material
author Fauth, Camille
Bonneau, Anne
Zimmerer, Frank
Trouvain, Jürgen
Andreeva, Bistra
Colotte, Vincent
Fohr, Dominique
Jouvet, Denis
Jügler, Jeanin
Laprie, Yves
Mella, Odile
Möbius, Bernd
author_facet Fauth, Camille
Bonneau, Anne
Zimmerer, Frank
Trouvain, Jürgen
Andreeva, Bistra
Colotte, Vincent
Fohr, Dominique
Jouvet, Denis
Jügler, Jeanin
Laprie, Yves
Mella, Odile
Möbius, Bernd
author_sort Fauth, Camille
title Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process
title_short Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process
title_full Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process
title_fullStr Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process
title_full_unstemmed Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process
title_sort designing a bilingual speech corpus for french and german language learners: a two-step process
publisher HAL CCSD
publishDate 2014
url https://hal.inria.fr/hal-00979026/file/LREC_IFCASL_long.pdf
https://hal.inria.fr/hal-00979026
op_coverage Reykjavik, Iceland
genre Iceland
genre_facet Iceland
op_source Hyper Article en Ligne - Sciences de l'Homme et de la Société
LREC - 9th Language Resources and Evaluation Conference
LREC - 9th Language Resources and Evaluation Conference, The European Language Resources Association, May 2014, Reykjavik, Iceland
op_relation hal-00979026
10670/1.40wfin
https://hal.inria.fr/hal-00979026/file/LREC_IFCASL_long.pdf
https://hal.inria.fr/hal-00979026
op_rights other
_version_ 1766040834276327424