Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian

International audience The paper presents a method for parsing low-resource languages with very small training corpora using multilingual word embeddings and annotated corpora of larger languages. The study demonstrates that specific language combinations enable improved dependency parsing when comp...

Full description

Bibliographic Details
Main Authors: Lim, Kyungtae, Partanen, Niko, Poibeau, Thierry
Other Authors: Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094 (Lattice), Université Sorbonne Nouvelle - Paris 3-Université Sorbonne Paris Cité (USPC)-Centre National de la Recherche Scientifique (CNRS)-Université Paris sciences et lettres (PSL)-Département Littératures et langage - ENS Paris (LILA), École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL), ELRA, ANR ERA-NET Atlantis
Format: Conference Object
Language:English
Published: HAL CCSD 2018
Subjects:
Online Access:https://hal.science/hal-01856178
https://hal.science/hal-01856178/document
https://hal.science/hal-01856178/file/600.pdf
Description
Summary:International audience The paper presents a method for parsing low-resource languages with very small training corpora using multilingual word embeddings and annotated corpora of larger languages. The study demonstrates that specific language combinations enable improved dependency parsing when compared to previous work, allowing for wider reuse of pre-existing resources when parsing low-resource languages. The study also explores the question of whether contemporary contact languages or genetically related languages would be the most fruitful starting point for multilingual parsing scenarios.