Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process
International audience We present the design of a corpus of native and non-native speech for the language pair French-German, with a special emphasis on phonetic and prosodic aspects. To our knowledge there is no suitable corpus, in terms of size and coverage, currently available for the target lang...
Main Authors: | , , , , , , , , , , , |
---|---|
Other Authors: | , , , , , , , , , |
Format: | Conference Object |
Language: | English |
Published: |
HAL CCSD
2014
|
Subjects: | |
Online Access: | https://inria.hal.science/hal-00979026 https://inria.hal.science/hal-00979026v1/document https://inria.hal.science/hal-00979026v1/file/LREC_IFCASL_long.pdf |
id |
ftanrparis:oai:HAL:hal-00979026v1 |
---|---|
record_format |
openpolar |
spelling |
ftanrparis:oai:HAL:hal-00979026v1 2024-10-06T13:50:02+00:00 Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process Fauth, Camille Bonneau, Anne Zimmerer, Frank Trouvain, Jürgen Andreeva, Bistra Colotte, Vincent Fohr, Dominique Jouvet, Denis Jügler, Jeanin Laprie, Yves Mella, Odile Möbius, Bernd Analysis, perception and recognition of speech (PAROLE) Inria Nancy - Grand Est Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD) Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA) Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA) Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS) Allgemeine Linguistik Computational Linguistics and phonetics (Allgemeine Linguistik) Saarland University Saarbrücken The European Language Resources Association ANR-12-FRAL-0007,IFCASL,apprentissage des langues assisté par ordinateur(2012) Reykjavik, Iceland 2014-05-26 https://inria.hal.science/hal-00979026 https://inria.hal.science/hal-00979026v1/document https://inria.hal.science/hal-00979026v1/file/LREC_IFCASL_long.pdf en eng HAL CCSD info:eu-repo/semantics/OpenAccess LREC - 9th Language Resources and Evaluation Conference https://inria.hal.science/hal-00979026 LREC - 9th Language Resources and Evaluation Conference, The European Language Resources Association, May 2014, Reykjavik, Iceland speech corpus phonetics language learning [SCCO.LING]Cognitive science/Linguistics info:eu-repo/semantics/conferenceObject Conference papers 2014 ftanrparis 2024-09-26T14:18:41Z International audience We present the design of a corpus of native and non-native speech for the language pair French-German, with a special emphasis on phonetic and prosodic aspects. To our knowledge there is no suitable corpus, in terms of size and coverage, currently available for the target language pair. To select the target L1-L2 interference phenomena we prepare a small preliminary corpus (corpus1), which is analyzed for coverage and cross-checked jointly by French and German experts. Based on this analysis, target phenomena on the phonetic and phonological level are selected on the basis of the expected degree of deviation from the native performance and the frequency of occurrence. 14 speakers performed both L2 (either French or German) and L1 material (either German or French). This allowed us to test, recordings duration, recordings material, the performance of our automatic aligner software. Then, we built corpus2 taking into account what we learned about corpus1. The aims are the same but we adapted speech material to avoid too long recording sessions. 100 speakers will be recorded. The corpus (corpus1 and corpus2) will be prepared as a searchable database, available for the scientific community after completion of the project. Conference Object Iceland Portail HAL-ANR (Agence Nationale de la Recherche) |
institution |
Open Polar |
collection |
Portail HAL-ANR (Agence Nationale de la Recherche) |
op_collection_id |
ftanrparis |
language |
English |
topic |
speech corpus phonetics language learning [SCCO.LING]Cognitive science/Linguistics |
spellingShingle |
speech corpus phonetics language learning [SCCO.LING]Cognitive science/Linguistics Fauth, Camille Bonneau, Anne Zimmerer, Frank Trouvain, Jürgen Andreeva, Bistra Colotte, Vincent Fohr, Dominique Jouvet, Denis Jügler, Jeanin Laprie, Yves Mella, Odile Möbius, Bernd Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process |
topic_facet |
speech corpus phonetics language learning [SCCO.LING]Cognitive science/Linguistics |
description |
International audience We present the design of a corpus of native and non-native speech for the language pair French-German, with a special emphasis on phonetic and prosodic aspects. To our knowledge there is no suitable corpus, in terms of size and coverage, currently available for the target language pair. To select the target L1-L2 interference phenomena we prepare a small preliminary corpus (corpus1), which is analyzed for coverage and cross-checked jointly by French and German experts. Based on this analysis, target phenomena on the phonetic and phonological level are selected on the basis of the expected degree of deviation from the native performance and the frequency of occurrence. 14 speakers performed both L2 (either French or German) and L1 material (either German or French). This allowed us to test, recordings duration, recordings material, the performance of our automatic aligner software. Then, we built corpus2 taking into account what we learned about corpus1. The aims are the same but we adapted speech material to avoid too long recording sessions. 100 speakers will be recorded. The corpus (corpus1 and corpus2) will be prepared as a searchable database, available for the scientific community after completion of the project. |
author2 |
Analysis, perception and recognition of speech (PAROLE) Inria Nancy - Grand Est Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD) Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA) Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA) Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS) Allgemeine Linguistik Computational Linguistics and phonetics (Allgemeine Linguistik) Saarland University Saarbrücken The European Language Resources Association ANR-12-FRAL-0007,IFCASL,apprentissage des langues assisté par ordinateur(2012) |
format |
Conference Object |
author |
Fauth, Camille Bonneau, Anne Zimmerer, Frank Trouvain, Jürgen Andreeva, Bistra Colotte, Vincent Fohr, Dominique Jouvet, Denis Jügler, Jeanin Laprie, Yves Mella, Odile Möbius, Bernd |
author_facet |
Fauth, Camille Bonneau, Anne Zimmerer, Frank Trouvain, Jürgen Andreeva, Bistra Colotte, Vincent Fohr, Dominique Jouvet, Denis Jügler, Jeanin Laprie, Yves Mella, Odile Möbius, Bernd |
author_sort |
Fauth, Camille |
title |
Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process |
title_short |
Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process |
title_full |
Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process |
title_fullStr |
Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process |
title_full_unstemmed |
Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process |
title_sort |
designing a bilingual speech corpus for french and german language learners: a two-step process |
publisher |
HAL CCSD |
publishDate |
2014 |
url |
https://inria.hal.science/hal-00979026 https://inria.hal.science/hal-00979026v1/document https://inria.hal.science/hal-00979026v1/file/LREC_IFCASL_long.pdf |
op_coverage |
Reykjavik, Iceland |
genre |
Iceland |
genre_facet |
Iceland |
op_source |
LREC - 9th Language Resources and Evaluation Conference https://inria.hal.science/hal-00979026 LREC - 9th Language Resources and Evaluation Conference, The European Language Resources Association, May 2014, Reykjavik, Iceland |
op_rights |
info:eu-repo/semantics/OpenAccess |
_version_ |
1812178119979696128 |