Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian

International audience The paper presents a method for parsing low-resource languages with very small training corpora using multilingual word embeddings and annotated corpora of larger languages. The study demonstrates that specific language combinations enable improved dependency parsing when comp...

Full description

Bibliographic Details
Main Authors: Lim, KyungTae, Partanen, Niko, Poibeau, Thierry
Other Authors: Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094 (Lattice), Département Littératures et langage (LILA), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université Sorbonne Paris Cité (USPC)-Université Sorbonne Nouvelle - Paris 3, ELRA, ANR ERA-NET Atlantis
Format: Conference Object
Language:English
Published: HAL CCSD 2018
Subjects:
Online Access:https://hal.archives-ouvertes.fr/hal-01856178
https://hal.archives-ouvertes.fr/hal-01856178/document
https://hal.archives-ouvertes.fr/hal-01856178/file/600.pdf
id ftccsdartic:oai:HAL:hal-01856178v1
record_format openpolar
spelling ftccsdartic:oai:HAL:hal-01856178v1 2023-05-15T18:08:15+02:00 Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian Lim, KyungTae Partanen, Niko Poibeau, Thierry Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094 (Lattice) Département Littératures et langage (LILA) École normale supérieure - Paris (ENS Paris) Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris) Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université Sorbonne Paris Cité (USPC)-Université Sorbonne Nouvelle - Paris 3 ELRA ANR ERA-NET Atlantis Miyazaki, Japan 2018-05-07 https://hal.archives-ouvertes.fr/hal-01856178 https://hal.archives-ouvertes.fr/hal-01856178/document https://hal.archives-ouvertes.fr/hal-01856178/file/600.pdf en eng HAL CCSD hal-01856178 https://hal.archives-ouvertes.fr/hal-01856178 https://hal.archives-ouvertes.fr/hal-01856178/document https://hal.archives-ouvertes.fr/hal-01856178/file/600.pdf info:eu-repo/semantics/OpenAccess LREC 2018 Proceedings Language Resource and Evaluation Conference https://hal.archives-ouvertes.fr/hal-01856178 Language Resource and Evaluation Conference, ELRA, May 2018, Miyazaki, Japan http://www.lrec-conf.org/proceedings/lrec2018/pdf/600.pdf dependency parsing word embeddings Uralic languages [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] [SCCO.COMP]Cognitive science/Computer science [SCCO.LING]Cognitive science/Linguistics [SHS.LANGUE]Humanities and Social Sciences/Linguistics [SHS.INFO]Humanities and Social Sciences/Library and information sciences info:eu-repo/semantics/conferenceObject Conference papers 2018 ftccsdartic 2021-11-21T01:56:14Z International audience The paper presents a method for parsing low-resource languages with very small training corpora using multilingual word embeddings and annotated corpora of larger languages. The study demonstrates that specific language combinations enable improved dependency parsing when compared to previous work, allowing for wider reuse of pre-existing resources when parsing low-resource languages. The study also explores the question of whether contemporary contact languages or genetically related languages would be the most fruitful starting point for multilingual parsing scenarios. Conference Object saami Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe)
institution Open Polar
collection Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe)
op_collection_id ftccsdartic
language English
topic dependency parsing
word embeddings
Uralic languages
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
[SCCO.COMP]Cognitive science/Computer science
[SCCO.LING]Cognitive science/Linguistics
[SHS.LANGUE]Humanities and Social Sciences/Linguistics
[SHS.INFO]Humanities and Social Sciences/Library and information sciences
spellingShingle dependency parsing
word embeddings
Uralic languages
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
[SCCO.COMP]Cognitive science/Computer science
[SCCO.LING]Cognitive science/Linguistics
[SHS.LANGUE]Humanities and Social Sciences/Linguistics
[SHS.INFO]Humanities and Social Sciences/Library and information sciences
Lim, KyungTae
Partanen, Niko
Poibeau, Thierry
Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian
topic_facet dependency parsing
word embeddings
Uralic languages
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
[SCCO.COMP]Cognitive science/Computer science
[SCCO.LING]Cognitive science/Linguistics
[SHS.LANGUE]Humanities and Social Sciences/Linguistics
[SHS.INFO]Humanities and Social Sciences/Library and information sciences
description International audience The paper presents a method for parsing low-resource languages with very small training corpora using multilingual word embeddings and annotated corpora of larger languages. The study demonstrates that specific language combinations enable improved dependency parsing when compared to previous work, allowing for wider reuse of pre-existing resources when parsing low-resource languages. The study also explores the question of whether contemporary contact languages or genetically related languages would be the most fruitful starting point for multilingual parsing scenarios.
author2 Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094 (Lattice)
Département Littératures et langage (LILA)
École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université Sorbonne Paris Cité (USPC)-Université Sorbonne Nouvelle - Paris 3
ELRA
ANR ERA-NET Atlantis
format Conference Object
author Lim, KyungTae
Partanen, Niko
Poibeau, Thierry
author_facet Lim, KyungTae
Partanen, Niko
Poibeau, Thierry
author_sort Lim, KyungTae
title Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian
title_short Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian
title_full Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian
title_fullStr Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian
title_full_unstemmed Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian
title_sort multilingual dependency parsing for low-resource languages: case studies on north saami and komi-zyrian
publisher HAL CCSD
publishDate 2018
url https://hal.archives-ouvertes.fr/hal-01856178
https://hal.archives-ouvertes.fr/hal-01856178/document
https://hal.archives-ouvertes.fr/hal-01856178/file/600.pdf
op_coverage Miyazaki, Japan
genre saami
genre_facet saami
op_source LREC 2018 Proceedings
Language Resource and Evaluation Conference
https://hal.archives-ouvertes.fr/hal-01856178
Language Resource and Evaluation Conference, ELRA, May 2018, Miyazaki, Japan
http://www.lrec-conf.org/proceedings/lrec2018/pdf/600.pdf
op_relation hal-01856178
https://hal.archives-ouvertes.fr/hal-01856178
https://hal.archives-ouvertes.fr/hal-01856178/document
https://hal.archives-ouvertes.fr/hal-01856178/file/600.pdf
op_rights info:eu-repo/semantics/OpenAccess
_version_ 1766180528188293120