Attaching Translations to Proper Lexical Senses in DBnary

International audience The DBnary project aims at providing high quality Lexical Linked Data extracted from different Wiktionary language editions. Data from 10 different languages is currently extracted for a total of over 3.16M translation links that connect lexical entries from the 10 extracted l...

Full description

Bibliographic Details
Main Authors: Tchechmedjiev, Andon, Sérasset, Gilles, Goulian, Jérôme, Schwab, Didier
Other Authors: Université Grenoble Alpes 2016-2019 (UGA 2016-2019 ), Laboratoire d'Informatique de Grenoble (LIG), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS), Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole (GETALP), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)
Format: Conference Object
Language:English
Published: HAL CCSD 2014
Subjects:
Online Access:https://hal.science/hal-00990870
https://hal.science/hal-00990870/document
https://hal.science/hal-00990870/file/dbnary-wsd.pdf
id ftunivnantes:oai:HAL:hal-00990870v1
record_format openpolar
spelling ftunivnantes:oai:HAL:hal-00990870v1 2023-05-15T16:51:18+02:00 Attaching Translations to Proper Lexical Senses in DBnary Tchechmedjiev, Andon Sérasset, Gilles Goulian, Jérôme Schwab, Didier Université Grenoble Alpes 2016-2019 (UGA 2016-2019 ) Laboratoire d'Informatique de Grenoble (LIG) Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS) Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole (GETALP) Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS) Reykjavik, Iceland 2014-05-27 https://hal.science/hal-00990870 https://hal.science/hal-00990870/document https://hal.science/hal-00990870/file/dbnary-wsd.pdf en eng HAL CCSD hal-00990870 https://hal.science/hal-00990870 https://hal.science/hal-00990870/document https://hal.science/hal-00990870/file/dbnary-wsd.pdf info:eu-repo/semantics/OpenAccess 3rd Workshop on Linked Data in Linguistics: Multilingual Knowledge Resources and Natural Language Processing https://hal.science/hal-00990870 3rd Workshop on Linked Data in Linguistics: Multilingual Knowledge Resources and Natural Language Processing, May 2014, Reykjavik, Iceland [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] info:eu-repo/semantics/conferenceObject Conference papers 2014 ftunivnantes 2023-02-28T23:41:31Z International audience The DBnary project aims at providing high quality Lexical Linked Data extracted from different Wiktionary language editions. Data from 10 different languages is currently extracted for a total of over 3.16M translation links that connect lexical entries from the 10 extracted languages, to entries in more than one thousand languages. In Wiktionary, glosses are often associated with translations to help users understand to what sense they refer to, whether through a textual definition or a target sense number. In this article we aim at the extraction of as much of this information as possible and then the disambiguation of the corresponding translations for all languages available. We use an adaptation of various textual and semantic similarity techniques based on partial or fuzzy gloss overlaps to disambiguate the translation relations (To account for the lack of normalization, e.g. lemmatization and PoS tagging) and then extract some of the sense number information present to build a gold standard so as to evaluate our disambiguation as well as tune and optimize the parameters of the similarity measures. We obtain F-measures of the order of 80\% (on par with similar work on English only), across the three languages where we could generate a gold standard (French, Portuguese, Finnish) and show that most of the disambiguation errors are due to inconsistencies in Wiktionary itself that cannot be detected at the generation of DBnary (shifted sense numbers, inconsistent glosses, etc.). Conference Object Iceland Université de Nantes: HAL-UNIV-NANTES
institution Open Polar
collection Université de Nantes: HAL-UNIV-NANTES
op_collection_id ftunivnantes
language English
topic [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
spellingShingle [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
Tchechmedjiev, Andon
Sérasset, Gilles
Goulian, Jérôme
Schwab, Didier
Attaching Translations to Proper Lexical Senses in DBnary
topic_facet [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
description International audience The DBnary project aims at providing high quality Lexical Linked Data extracted from different Wiktionary language editions. Data from 10 different languages is currently extracted for a total of over 3.16M translation links that connect lexical entries from the 10 extracted languages, to entries in more than one thousand languages. In Wiktionary, glosses are often associated with translations to help users understand to what sense they refer to, whether through a textual definition or a target sense number. In this article we aim at the extraction of as much of this information as possible and then the disambiguation of the corresponding translations for all languages available. We use an adaptation of various textual and semantic similarity techniques based on partial or fuzzy gloss overlaps to disambiguate the translation relations (To account for the lack of normalization, e.g. lemmatization and PoS tagging) and then extract some of the sense number information present to build a gold standard so as to evaluate our disambiguation as well as tune and optimize the parameters of the similarity measures. We obtain F-measures of the order of 80\% (on par with similar work on English only), across the three languages where we could generate a gold standard (French, Portuguese, Finnish) and show that most of the disambiguation errors are due to inconsistencies in Wiktionary itself that cannot be detected at the generation of DBnary (shifted sense numbers, inconsistent glosses, etc.).
author2 Université Grenoble Alpes 2016-2019 (UGA 2016-2019 )
Laboratoire d'Informatique de Grenoble (LIG)
Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)
Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole (GETALP)
Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)
format Conference Object
author Tchechmedjiev, Andon
Sérasset, Gilles
Goulian, Jérôme
Schwab, Didier
author_facet Tchechmedjiev, Andon
Sérasset, Gilles
Goulian, Jérôme
Schwab, Didier
author_sort Tchechmedjiev, Andon
title Attaching Translations to Proper Lexical Senses in DBnary
title_short Attaching Translations to Proper Lexical Senses in DBnary
title_full Attaching Translations to Proper Lexical Senses in DBnary
title_fullStr Attaching Translations to Proper Lexical Senses in DBnary
title_full_unstemmed Attaching Translations to Proper Lexical Senses in DBnary
title_sort attaching translations to proper lexical senses in dbnary
publisher HAL CCSD
publishDate 2014
url https://hal.science/hal-00990870
https://hal.science/hal-00990870/document
https://hal.science/hal-00990870/file/dbnary-wsd.pdf
op_coverage Reykjavik, Iceland
genre Iceland
genre_facet Iceland
op_source 3rd Workshop on Linked Data in Linguistics: Multilingual Knowledge Resources and Natural Language Processing
https://hal.science/hal-00990870
3rd Workshop on Linked Data in Linguistics: Multilingual Knowledge Resources and Natural Language Processing, May 2014, Reykjavik, Iceland
op_relation hal-00990870
https://hal.science/hal-00990870
https://hal.science/hal-00990870/document
https://hal.science/hal-00990870/file/dbnary-wsd.pdf
op_rights info:eu-repo/semantics/OpenAccess
_version_ 1766041420575014912