From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers

International audience In this paper we present a statistical machine learning approach to neologism detection going some way beyond the use of exclusion lists. We explore the impact of three groups of features: form related, morpho-lexical and thematic features. The latter type of features has not...

Full description

Bibliographic Details
Main Authors: Falk, Ingrid, Bernhard, Delphine, Gérard, Christophe
Other Authors: Linguistique, Langues et Parole (LILPA), Université de Strasbourg (UNISTRA), Logoscope, Contrat IDEX 2012 avec l'Université de Strasbourg, Logoscope
Format: Conference Object
Language:English
Published: HAL CCSD 2014
Subjects:
Online Access:https://hal.inria.fr/hal-00959079
https://hal.inria.fr/hal-00959079/document
https://hal.inria.fr/hal-00959079/file/logo.pdf
id ftccsdartic:oai:HAL:hal-00959079v1
record_format openpolar
spelling ftccsdartic:oai:HAL:hal-00959079v1 2023-05-15T16:48:37+02:00 From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers Falk, Ingrid Bernhard, Delphine Gérard, Christophe Linguistique, Langues et Parole (LILPA) Université de Strasbourg (UNISTRA) Logoscope, Contrat IDEX 2012 avec l'Université de Strasbourg Logoscope Reykjavik, Iceland 2014-05-27 https://hal.inria.fr/hal-00959079 https://hal.inria.fr/hal-00959079/document https://hal.inria.fr/hal-00959079/file/logo.pdf en eng HAL CCSD hal-00959079 https://hal.inria.fr/hal-00959079 https://hal.inria.fr/hal-00959079/document https://hal.inria.fr/hal-00959079/file/logo.pdf info:eu-repo/semantics/OpenAccess LREC - The 9th edition of the Language Resources and Evaluation Conference https://hal.inria.fr/hal-00959079 LREC - The 9th edition of the Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland [SCCO.LING]Cognitive science/Linguistics [SCCO.COMP]Cognitive science/Computer science [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing info:eu-repo/semantics/conferenceObject Conference papers 2014 ftccsdartic 2023-03-18T23:32:23Z International audience In this paper we present a statistical machine learning approach to neologism detection going some way beyond the use of exclusion lists. We explore the impact of three groups of features: form related, morpho-lexical and thematic features. The latter type of features has not yet been used in this kind of application and represents a way to access the semantic context of new words. The results suggest that form related features are helpful at the overall classification task, while morpho-lexical and thematic features better single out true neologisms. Conference Object Iceland Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe)
institution Open Polar
collection Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe)
op_collection_id ftccsdartic
language English
topic [SCCO.LING]Cognitive science/Linguistics
[SCCO.COMP]Cognitive science/Computer science
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
spellingShingle [SCCO.LING]Cognitive science/Linguistics
[SCCO.COMP]Cognitive science/Computer science
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
Falk, Ingrid
Bernhard, Delphine
Gérard, Christophe
From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers
topic_facet [SCCO.LING]Cognitive science/Linguistics
[SCCO.COMP]Cognitive science/Computer science
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
description International audience In this paper we present a statistical machine learning approach to neologism detection going some way beyond the use of exclusion lists. We explore the impact of three groups of features: form related, morpho-lexical and thematic features. The latter type of features has not yet been used in this kind of application and represents a way to access the semantic context of new words. The results suggest that form related features are helpful at the overall classification task, while morpho-lexical and thematic features better single out true neologisms.
author2 Linguistique, Langues et Parole (LILPA)
Université de Strasbourg (UNISTRA)
Logoscope, Contrat IDEX 2012 avec l'Université de Strasbourg
Logoscope
format Conference Object
author Falk, Ingrid
Bernhard, Delphine
Gérard, Christophe
author_facet Falk, Ingrid
Bernhard, Delphine
Gérard, Christophe
author_sort Falk, Ingrid
title From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers
title_short From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers
title_full From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers
title_fullStr From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers
title_full_unstemmed From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers
title_sort from non word to new word: automatically identifying neologisms in french newspapers
publisher HAL CCSD
publishDate 2014
url https://hal.inria.fr/hal-00959079
https://hal.inria.fr/hal-00959079/document
https://hal.inria.fr/hal-00959079/file/logo.pdf
op_coverage Reykjavik, Iceland
genre Iceland
genre_facet Iceland
op_source LREC - The 9th edition of the Language Resources and Evaluation Conference
https://hal.inria.fr/hal-00959079
LREC - The 9th edition of the Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland
op_relation hal-00959079
https://hal.inria.fr/hal-00959079
https://hal.inria.fr/hal-00959079/document
https://hal.inria.fr/hal-00959079/file/logo.pdf
op_rights info:eu-repo/semantics/OpenAccess
_version_ 1766038704114106368