Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian

International audience The paper presents a method for parsing low-resource languages with very small training corpora using multilingual word embeddings and annotated corpora of larger languages. The study demonstrates that specific language combinations enable improved dependency parsing when comp...

Full description

Bibliographic Details
Main Authors: Lim, Kyungtae, Partanen, Niko, Poibeau, Thierry
Other Authors: Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094 (Lattice), Université Sorbonne Nouvelle - Paris 3-Université Sorbonne Paris Cité (USPC)-Centre National de la Recherche Scientifique (CNRS)-Université Paris Sciences et Lettres (PSL)-Département Littératures et langage - ENS Paris (LILA), École normale supérieure - Paris (ENS-PSL), Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-École normale supérieure - Paris (ENS-PSL), Université Paris Sciences et Lettres (PSL), ELRA, ANR ERA-NET Atlantis
Format: Conference Object
Language:English
Published: HAL CCSD 2018
Subjects:
Online Access:https://hal.science/hal-01856178
https://hal.science/hal-01856178/document
https://hal.science/hal-01856178/file/600.pdf
id ftunivparis3:oai:HAL:hal-01856178v1
record_format openpolar
spelling ftunivparis3:oai:HAL:hal-01856178v1 2024-05-19T07:47:52+00:00 Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian Lim, Kyungtae Partanen, Niko Poibeau, Thierry Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094 (Lattice) Université Sorbonne Nouvelle - Paris 3-Université Sorbonne Paris Cité (USPC)-Centre National de la Recherche Scientifique (CNRS)-Université Paris Sciences et Lettres (PSL)-Département Littératures et langage - ENS Paris (LILA) École normale supérieure - Paris (ENS-PSL) Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-École normale supérieure - Paris (ENS-PSL) Université Paris Sciences et Lettres (PSL) ELRA ANR ERA-NET Atlantis Miyazaki, Japan 2018-05-07 https://hal.science/hal-01856178 https://hal.science/hal-01856178/document https://hal.science/hal-01856178/file/600.pdf en eng HAL CCSD hal-01856178 https://hal.science/hal-01856178 https://hal.science/hal-01856178/document https://hal.science/hal-01856178/file/600.pdf info:eu-repo/semantics/OpenAccess LREC 2018 Proceedings Language Resource and Evaluation Conference https://hal.science/hal-01856178 Language Resource and Evaluation Conference, ELRA, May 2018, Miyazaki, Japan http://www.lrec-conf.org/proceedings/lrec2018/pdf/600.pdf dependency parsing word embeddings Uralic languages [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] [SCCO.COMP]Cognitive science/Computer science [SCCO.LING]Cognitive science/Linguistics [SHS.LANGUE]Humanities and Social Sciences/Linguistics [SHS.INFO]Humanities and Social Sciences/Library and information sciences info:eu-repo/semantics/conferenceObject Conference papers 2018 ftunivparis3 2024-04-24T23:50:10Z International audience The paper presents a method for parsing low-resource languages with very small training corpora using multilingual word embeddings and annotated corpora of larger languages. The study demonstrates that specific language combinations enable improved dependency parsing when compared to previous work, allowing for wider reuse of pre-existing resources when parsing low-resource languages. The study also explores the question of whether contemporary contact languages or genetically related languages would be the most fruitful starting point for multilingual parsing scenarios. Conference Object saami Université Sorbonne Nouvelle - Paris 3: HAL
institution Open Polar
collection Université Sorbonne Nouvelle - Paris 3: HAL
op_collection_id ftunivparis3
language English
topic dependency parsing
word embeddings
Uralic languages
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
[SCCO.COMP]Cognitive science/Computer science
[SCCO.LING]Cognitive science/Linguistics
[SHS.LANGUE]Humanities and Social Sciences/Linguistics
[SHS.INFO]Humanities and Social Sciences/Library and information sciences
spellingShingle dependency parsing
word embeddings
Uralic languages
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
[SCCO.COMP]Cognitive science/Computer science
[SCCO.LING]Cognitive science/Linguistics
[SHS.LANGUE]Humanities and Social Sciences/Linguistics
[SHS.INFO]Humanities and Social Sciences/Library and information sciences
Lim, Kyungtae
Partanen, Niko
Poibeau, Thierry
Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian
topic_facet dependency parsing
word embeddings
Uralic languages
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
[SCCO.COMP]Cognitive science/Computer science
[SCCO.LING]Cognitive science/Linguistics
[SHS.LANGUE]Humanities and Social Sciences/Linguistics
[SHS.INFO]Humanities and Social Sciences/Library and information sciences
description International audience The paper presents a method for parsing low-resource languages with very small training corpora using multilingual word embeddings and annotated corpora of larger languages. The study demonstrates that specific language combinations enable improved dependency parsing when compared to previous work, allowing for wider reuse of pre-existing resources when parsing low-resource languages. The study also explores the question of whether contemporary contact languages or genetically related languages would be the most fruitful starting point for multilingual parsing scenarios.
author2 Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094 (Lattice)
Université Sorbonne Nouvelle - Paris 3-Université Sorbonne Paris Cité (USPC)-Centre National de la Recherche Scientifique (CNRS)-Université Paris Sciences et Lettres (PSL)-Département Littératures et langage - ENS Paris (LILA)
École normale supérieure - Paris (ENS-PSL)
Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-École normale supérieure - Paris (ENS-PSL)
Université Paris Sciences et Lettres (PSL)
ELRA
ANR ERA-NET Atlantis
format Conference Object
author Lim, Kyungtae
Partanen, Niko
Poibeau, Thierry
author_facet Lim, Kyungtae
Partanen, Niko
Poibeau, Thierry
author_sort Lim, Kyungtae
title Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian
title_short Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian
title_full Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian
title_fullStr Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian
title_full_unstemmed Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian
title_sort multilingual dependency parsing for low-resource languages: case studies on north saami and komi-zyrian
publisher HAL CCSD
publishDate 2018
url https://hal.science/hal-01856178
https://hal.science/hal-01856178/document
https://hal.science/hal-01856178/file/600.pdf
op_coverage Miyazaki, Japan
genre saami
genre_facet saami
op_source LREC 2018 Proceedings
Language Resource and Evaluation Conference
https://hal.science/hal-01856178
Language Resource and Evaluation Conference, ELRA, May 2018, Miyazaki, Japan
http://www.lrec-conf.org/proceedings/lrec2018/pdf/600.pdf
op_relation hal-01856178
https://hal.science/hal-01856178
https://hal.science/hal-01856178/document
https://hal.science/hal-01856178/file/600.pdf
op_rights info:eu-repo/semantics/OpenAccess
_version_ 1799488357352341504