Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian
International audience The paper presents a method for parsing low-resource languages with very small training corpora using multilingual word embeddings and annotated corpora of larger languages. The study demonstrates that specific language combinations enable improved dependency parsing when comp...
Main Authors: | , , |
---|---|
Other Authors: | , , , , , , |
Format: | Conference Object |
Language: | English |
Published: |
HAL CCSD
2018
|
Subjects: | |
Online Access: | https://hal.archives-ouvertes.fr/hal-01856178 https://hal.archives-ouvertes.fr/hal-01856178/document https://hal.archives-ouvertes.fr/hal-01856178/file/600.pdf |
id |
ftccsdartic:oai:HAL:hal-01856178v1 |
---|---|
record_format |
openpolar |
spelling |
ftccsdartic:oai:HAL:hal-01856178v1 2023-05-15T18:08:15+02:00 Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian Lim, KyungTae Partanen, Niko Poibeau, Thierry Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094 (Lattice) Département Littératures et langage (LILA) École normale supérieure - Paris (ENS Paris) Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris) Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université Sorbonne Paris Cité (USPC)-Université Sorbonne Nouvelle - Paris 3 ELRA ANR ERA-NET Atlantis Miyazaki, Japan 2018-05-07 https://hal.archives-ouvertes.fr/hal-01856178 https://hal.archives-ouvertes.fr/hal-01856178/document https://hal.archives-ouvertes.fr/hal-01856178/file/600.pdf en eng HAL CCSD hal-01856178 https://hal.archives-ouvertes.fr/hal-01856178 https://hal.archives-ouvertes.fr/hal-01856178/document https://hal.archives-ouvertes.fr/hal-01856178/file/600.pdf info:eu-repo/semantics/OpenAccess LREC 2018 Proceedings Language Resource and Evaluation Conference https://hal.archives-ouvertes.fr/hal-01856178 Language Resource and Evaluation Conference, ELRA, May 2018, Miyazaki, Japan http://www.lrec-conf.org/proceedings/lrec2018/pdf/600.pdf dependency parsing word embeddings Uralic languages [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] [SCCO.COMP]Cognitive science/Computer science [SCCO.LING]Cognitive science/Linguistics [SHS.LANGUE]Humanities and Social Sciences/Linguistics [SHS.INFO]Humanities and Social Sciences/Library and information sciences info:eu-repo/semantics/conferenceObject Conference papers 2018 ftccsdartic 2021-11-21T01:56:14Z International audience The paper presents a method for parsing low-resource languages with very small training corpora using multilingual word embeddings and annotated corpora of larger languages. The study demonstrates that specific language combinations enable improved dependency parsing when compared to previous work, allowing for wider reuse of pre-existing resources when parsing low-resource languages. The study also explores the question of whether contemporary contact languages or genetically related languages would be the most fruitful starting point for multilingual parsing scenarios. Conference Object saami Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe) |
institution |
Open Polar |
collection |
Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe) |
op_collection_id |
ftccsdartic |
language |
English |
topic |
dependency parsing word embeddings Uralic languages [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] [SCCO.COMP]Cognitive science/Computer science [SCCO.LING]Cognitive science/Linguistics [SHS.LANGUE]Humanities and Social Sciences/Linguistics [SHS.INFO]Humanities and Social Sciences/Library and information sciences |
spellingShingle |
dependency parsing word embeddings Uralic languages [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] [SCCO.COMP]Cognitive science/Computer science [SCCO.LING]Cognitive science/Linguistics [SHS.LANGUE]Humanities and Social Sciences/Linguistics [SHS.INFO]Humanities and Social Sciences/Library and information sciences Lim, KyungTae Partanen, Niko Poibeau, Thierry Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian |
topic_facet |
dependency parsing word embeddings Uralic languages [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] [SCCO.COMP]Cognitive science/Computer science [SCCO.LING]Cognitive science/Linguistics [SHS.LANGUE]Humanities and Social Sciences/Linguistics [SHS.INFO]Humanities and Social Sciences/Library and information sciences |
description |
International audience The paper presents a method for parsing low-resource languages with very small training corpora using multilingual word embeddings and annotated corpora of larger languages. The study demonstrates that specific language combinations enable improved dependency parsing when compared to previous work, allowing for wider reuse of pre-existing resources when parsing low-resource languages. The study also explores the question of whether contemporary contact languages or genetically related languages would be the most fruitful starting point for multilingual parsing scenarios. |
author2 |
Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094 (Lattice) Département Littératures et langage (LILA) École normale supérieure - Paris (ENS Paris) Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris) Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université Sorbonne Paris Cité (USPC)-Université Sorbonne Nouvelle - Paris 3 ELRA ANR ERA-NET Atlantis |
format |
Conference Object |
author |
Lim, KyungTae Partanen, Niko Poibeau, Thierry |
author_facet |
Lim, KyungTae Partanen, Niko Poibeau, Thierry |
author_sort |
Lim, KyungTae |
title |
Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian |
title_short |
Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian |
title_full |
Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian |
title_fullStr |
Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian |
title_full_unstemmed |
Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian |
title_sort |
multilingual dependency parsing for low-resource languages: case studies on north saami and komi-zyrian |
publisher |
HAL CCSD |
publishDate |
2018 |
url |
https://hal.archives-ouvertes.fr/hal-01856178 https://hal.archives-ouvertes.fr/hal-01856178/document https://hal.archives-ouvertes.fr/hal-01856178/file/600.pdf |
op_coverage |
Miyazaki, Japan |
genre |
saami |
genre_facet |
saami |
op_source |
LREC 2018 Proceedings Language Resource and Evaluation Conference https://hal.archives-ouvertes.fr/hal-01856178 Language Resource and Evaluation Conference, ELRA, May 2018, Miyazaki, Japan http://www.lrec-conf.org/proceedings/lrec2018/pdf/600.pdf |
op_relation |
hal-01856178 https://hal.archives-ouvertes.fr/hal-01856178 https://hal.archives-ouvertes.fr/hal-01856178/document https://hal.archives-ouvertes.fr/hal-01856178/file/600.pdf |
op_rights |
info:eu-repo/semantics/OpenAccess |
_version_ |
1766180528188293120 |