Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian

International audience The paper presents a method for parsing low-resource languages with very small training corpora using multilingual word embeddings and annotated corpora of larger languages. The study demonstrates that specific language combinations enable improved dependency parsing when comp...

Full description

Bibliographic Details
Main Authors: Lim, KyungTae, Partanen, Niko, Poibeau, Thierry
Other Authors: Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094 (Lattice), Département Littératures et langage - ENS Paris (LILA), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université Sorbonne Paris Cité (USPC)-Université Sorbonne Nouvelle - Paris 3, ELRA, ANR ERA-NET Atlantis
Format: Other/Unknown Material
Language:English
Published: HAL CCSD 2018
Subjects:
Online Access:https://hal.archives-ouvertes.fr/hal-01856178/file/600.pdf
https://hal.archives-ouvertes.fr/hal-01856178
Description
Summary:International audience The paper presents a method for parsing low-resource languages with very small training corpora using multilingual word embeddings and annotated corpora of larger languages. The study demonstrates that specific language combinations enable improved dependency parsing when compared to previous work, allowing for wider reuse of pre-existing resources when parsing low-resource languages. The study also explores the question of whether contemporary contact languages or genetically related languages would be the most fruitful starting point for multilingual parsing scenarios.