A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages
International audience In this paper, we describe our generic approach for transferring part-of-speech annotations from a resourced language towards an etymologically closely related non-resourced language, without using any bilingual (i.e., parallel) data. We first induce a translation lexicon from...
Main Authors: | , |
---|---|
Other Authors: | , , , , , , , , |
Format: | Conference Object |
Language: | English |
Published: |
HAL CCSD
2014
|
Subjects: | |
Online Access: | https://hal.inria.fr/hal-01022298 https://hal.inria.fr/hal-01022298/document https://hal.inria.fr/hal-01022298/file/lrec14cll.pdf |
id |
ftunivnantes:oai:HAL:hal-01022298v1 |
---|---|
record_format |
openpolar |
spelling |
ftunivnantes:oai:HAL:hal-01022298v1 2023-05-15T16:49:26+02:00 A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages Scherrer, Yves Sagot, Benoît LATL-CUI Laboratoire d'Analyse et de Technologie du Langage (LATL) Université de Genève = University of Geneva (UNIGE)-Université de Genève = University of Geneva (UNIGE) Analyse Linguistique Profonde à Grande Echelle Large-scale deep linguistic processing (ALPAGE) Inria Paris-Rocquencourt Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Paris Diderot - Paris 7 (UPD7) European Language Resources Association ANR-11-IDEX-0005,USPC,Université Sorbonne Paris Cité(2011) Reykjavik, Iceland 2014-05-26 https://hal.inria.fr/hal-01022298 https://hal.inria.fr/hal-01022298/document https://hal.inria.fr/hal-01022298/file/lrec14cll.pdf en eng HAL CCSD hal-01022298 https://hal.inria.fr/hal-01022298 https://hal.inria.fr/hal-01022298/document https://hal.inria.fr/hal-01022298/file/lrec14cll.pdf info:eu-repo/semantics/OpenAccess Language Resources and Evaluation Conference https://hal.inria.fr/hal-01022298 Language Resources and Evaluation Conference, European Language Resources Association, May 2014, Reykjavik, Iceland ACM: J.: Computer Applications/J.5: ARTS AND HUMANITIES/J.5.4: Linguistics ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.7: Natural Language Processing [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] info:eu-repo/semantics/conferenceObject Conference papers 2014 ftunivnantes 2022-11-02T02:23:18Z International audience In this paper, we describe our generic approach for transferring part-of-speech annotations from a resourced language towards an etymologically closely related non-resourced language, without using any bilingual (i.e., parallel) data. We first induce a translation lexicon from monolingual corpora, based on cognate detection followed by cross-lingual contextual similarity. Second, POS information is transferred from the resourced language along translation pairs to the non-resourced language and used for tagging the corpus. We evaluate our methods on three language families, consisting of five Romance languages, three Germanic languages and five Slavic languages. We obtain tagging accuracies of up to 91.6%. Conference Object Iceland Université de Nantes: HAL-UNIV-NANTES |
institution |
Open Polar |
collection |
Université de Nantes: HAL-UNIV-NANTES |
op_collection_id |
ftunivnantes |
language |
English |
topic |
ACM: J.: Computer Applications/J.5: ARTS AND HUMANITIES/J.5.4: Linguistics ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.7: Natural Language Processing [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] |
spellingShingle |
ACM: J.: Computer Applications/J.5: ARTS AND HUMANITIES/J.5.4: Linguistics ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.7: Natural Language Processing [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] Scherrer, Yves Sagot, Benoît A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages |
topic_facet |
ACM: J.: Computer Applications/J.5: ARTS AND HUMANITIES/J.5.4: Linguistics ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.7: Natural Language Processing [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] |
description |
International audience In this paper, we describe our generic approach for transferring part-of-speech annotations from a resourced language towards an etymologically closely related non-resourced language, without using any bilingual (i.e., parallel) data. We first induce a translation lexicon from monolingual corpora, based on cognate detection followed by cross-lingual contextual similarity. Second, POS information is transferred from the resourced language along translation pairs to the non-resourced language and used for tagging the corpus. We evaluate our methods on three language families, consisting of five Romance languages, three Germanic languages and five Slavic languages. We obtain tagging accuracies of up to 91.6%. |
author2 |
LATL-CUI Laboratoire d'Analyse et de Technologie du Langage (LATL) Université de Genève = University of Geneva (UNIGE)-Université de Genève = University of Geneva (UNIGE) Analyse Linguistique Profonde à Grande Echelle Large-scale deep linguistic processing (ALPAGE) Inria Paris-Rocquencourt Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Paris Diderot - Paris 7 (UPD7) European Language Resources Association ANR-11-IDEX-0005,USPC,Université Sorbonne Paris Cité(2011) |
format |
Conference Object |
author |
Scherrer, Yves Sagot, Benoît |
author_facet |
Scherrer, Yves Sagot, Benoît |
author_sort |
Scherrer, Yves |
title |
A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages |
title_short |
A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages |
title_full |
A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages |
title_fullStr |
A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages |
title_full_unstemmed |
A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages |
title_sort |
language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages |
publisher |
HAL CCSD |
publishDate |
2014 |
url |
https://hal.inria.fr/hal-01022298 https://hal.inria.fr/hal-01022298/document https://hal.inria.fr/hal-01022298/file/lrec14cll.pdf |
op_coverage |
Reykjavik, Iceland |
genre |
Iceland |
genre_facet |
Iceland |
op_source |
Language Resources and Evaluation Conference https://hal.inria.fr/hal-01022298 Language Resources and Evaluation Conference, European Language Resources Association, May 2014, Reykjavik, Iceland |
op_relation |
hal-01022298 https://hal.inria.fr/hal-01022298 https://hal.inria.fr/hal-01022298/document https://hal.inria.fr/hal-01022298/file/lrec14cll.pdf |
op_rights |
info:eu-repo/semantics/OpenAccess |
_version_ |
1766039573200109568 |