Semi-automatic Endogenous Enrichment of Collaboratively Constructed Lexical Resources: Piggybacking onto Wiktionary

International audience The lack of large-scale, freely available and durable lexical resources, and the consequences for NLP, is widely acknowledged but the attempts to cope with usual bottlenecks preventing their development often result in dead-ends. This article introduces a language-independent,...

Full description

Bibliographic Details
Main Authors: Sajous, Franck, Navarro, Emmanuel, Gaume, Bruno, Prévot, Laurent, Chudy, Yannick
Other Authors: Cognition, Langues, Langage, Ergonomie (CLLE-ERSS), École Pratique des Hautes Études (EPHE), Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Bordeaux Montaigne (UBM)-Centre National de la Recherche Scientifique (CNRS), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT), Laboratoire Parole et Langage (LPL), Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS), Hrafn Loftsson, Eiríkur Rögnvaldsson and Sigrún Helgadóttir
Format: Conference Object
Language:English
Published: HAL CCSD 2010
Subjects:
Online Access:https://hal.science/hal-00625326
https://hal.science/hal-00625326/document
https://hal.science/hal-00625326/file/sajousEtAl2010-IceTAL.pdf
id ftecolephe:oai:HAL:hal-00625326v1
record_format openpolar
spelling ftecolephe:oai:HAL:hal-00625326v1 2024-09-09T19:46:59+00:00 Semi-automatic Endogenous Enrichment of Collaboratively Constructed Lexical Resources: Piggybacking onto Wiktionary Sajous, Franck Navarro, Emmanuel Gaume, Bruno Prévot, Laurent Chudy, Yannick Cognition, Langues, Langage, Ergonomie (CLLE-ERSS) École Pratique des Hautes Études (EPHE) Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-Université Toulouse - Jean Jaurès (UT2J) Université de Toulouse (UT)-Université de Toulouse (UT)-Université Bordeaux Montaigne (UBM)-Centre National de la Recherche Scientifique (CNRS) Institut de recherche en informatique de Toulouse (IRIT) Université Toulouse Capitole (UT Capitole) Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J) Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3) Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP) Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI) Université Toulouse - Jean Jaurès (UT2J) Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3) Université de Toulouse (UT) Laboratoire Parole et Langage (LPL) Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS) Hrafn Loftsson, Eiríkur Rögnvaldsson and Sigrún Helgadóttir Reykjavik, Iceland 2010-08-16 https://hal.science/hal-00625326 https://hal.science/hal-00625326/document https://hal.science/hal-00625326/file/sajousEtAl2010-IceTAL.pdf en eng HAL CCSD Springer Berlin/Heidelberg hal-00625326 https://hal.science/hal-00625326 https://hal.science/hal-00625326/document https://hal.science/hal-00625326/file/sajousEtAl2010-IceTAL.pdf info:eu-repo/semantics/OpenAccess Advances in Natural Language Processing 7th International Conference on NLP, IceTAL 2010 https://hal.science/hal-00625326 7th International Conference on NLP, IceTAL 2010, Aug 2010, Reykjavik, Iceland. pp.332-344 Random Walks Collaboratively Constructed Lexical Resources Endogenous Enrichment Crowdsourcing Wiktionary [SHS.LANGUE]Humanities and Social Sciences/Linguistics [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] info:eu-repo/semantics/conferenceObject Conference papers 2010 ftecolephe 2024-07-08T23:40:42Z International audience The lack of large-scale, freely available and durable lexical resources, and the consequences for NLP, is widely acknowledged but the attempts to cope with usual bottlenecks preventing their development often result in dead-ends. This article introduces a language-independent, semi-automatic and endogenous method for enriching lexical resources, based on collaborative editing and random walks through existing lexical relationships, and shows how this approach enables us to overcome recurrent impediments. It compares the impact of using different data sources and similarity measures on the task of improving synonymy networks. Finally, it defines an architecture for applying the presented method to Wiktionary and explains how it has been implemented. Conference Object Iceland EPHE (Ecole pratique des hautes études, Paris): HAL
institution Open Polar
collection EPHE (Ecole pratique des hautes études, Paris): HAL
op_collection_id ftecolephe
language English
topic Random Walks
Collaboratively Constructed Lexical Resources
Endogenous Enrichment
Crowdsourcing
Wiktionary
[SHS.LANGUE]Humanities and Social Sciences/Linguistics
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
spellingShingle Random Walks
Collaboratively Constructed Lexical Resources
Endogenous Enrichment
Crowdsourcing
Wiktionary
[SHS.LANGUE]Humanities and Social Sciences/Linguistics
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
Sajous, Franck
Navarro, Emmanuel
Gaume, Bruno
Prévot, Laurent
Chudy, Yannick
Semi-automatic Endogenous Enrichment of Collaboratively Constructed Lexical Resources: Piggybacking onto Wiktionary
topic_facet Random Walks
Collaboratively Constructed Lexical Resources
Endogenous Enrichment
Crowdsourcing
Wiktionary
[SHS.LANGUE]Humanities and Social Sciences/Linguistics
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
description International audience The lack of large-scale, freely available and durable lexical resources, and the consequences for NLP, is widely acknowledged but the attempts to cope with usual bottlenecks preventing their development often result in dead-ends. This article introduces a language-independent, semi-automatic and endogenous method for enriching lexical resources, based on collaborative editing and random walks through existing lexical relationships, and shows how this approach enables us to overcome recurrent impediments. It compares the impact of using different data sources and similarity measures on the task of improving synonymy networks. Finally, it defines an architecture for applying the presented method to Wiktionary and explains how it has been implemented.
author2 Cognition, Langues, Langage, Ergonomie (CLLE-ERSS)
École Pratique des Hautes Études (EPHE)
Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-Université Toulouse - Jean Jaurès (UT2J)
Université de Toulouse (UT)-Université de Toulouse (UT)-Université Bordeaux Montaigne (UBM)-Centre National de la Recherche Scientifique (CNRS)
Institut de recherche en informatique de Toulouse (IRIT)
Université Toulouse Capitole (UT Capitole)
Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J)
Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3)
Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP)
Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI)
Université Toulouse - Jean Jaurès (UT2J)
Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3)
Université de Toulouse (UT)
Laboratoire Parole et Langage (LPL)
Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS)
Hrafn Loftsson, Eiríkur Rögnvaldsson and Sigrún Helgadóttir
format Conference Object
author Sajous, Franck
Navarro, Emmanuel
Gaume, Bruno
Prévot, Laurent
Chudy, Yannick
author_facet Sajous, Franck
Navarro, Emmanuel
Gaume, Bruno
Prévot, Laurent
Chudy, Yannick
author_sort Sajous, Franck
title Semi-automatic Endogenous Enrichment of Collaboratively Constructed Lexical Resources: Piggybacking onto Wiktionary
title_short Semi-automatic Endogenous Enrichment of Collaboratively Constructed Lexical Resources: Piggybacking onto Wiktionary
title_full Semi-automatic Endogenous Enrichment of Collaboratively Constructed Lexical Resources: Piggybacking onto Wiktionary
title_fullStr Semi-automatic Endogenous Enrichment of Collaboratively Constructed Lexical Resources: Piggybacking onto Wiktionary
title_full_unstemmed Semi-automatic Endogenous Enrichment of Collaboratively Constructed Lexical Resources: Piggybacking onto Wiktionary
title_sort semi-automatic endogenous enrichment of collaboratively constructed lexical resources: piggybacking onto wiktionary
publisher HAL CCSD
publishDate 2010
url https://hal.science/hal-00625326
https://hal.science/hal-00625326/document
https://hal.science/hal-00625326/file/sajousEtAl2010-IceTAL.pdf
op_coverage Reykjavik, Iceland
genre Iceland
genre_facet Iceland
op_source Advances in Natural Language Processing
7th International Conference on NLP, IceTAL 2010
https://hal.science/hal-00625326
7th International Conference on NLP, IceTAL 2010, Aug 2010, Reykjavik, Iceland. pp.332-344
op_relation hal-00625326
https://hal.science/hal-00625326
https://hal.science/hal-00625326/document
https://hal.science/hal-00625326/file/sajousEtAl2010-IceTAL.pdf
op_rights info:eu-repo/semantics/OpenAccess
_version_ 1809916468249231360