From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers

International audience In this paper we present a statistical machine learning approach to neologism detection going some way beyond the use of exclusion lists. We explore the impact of three groups of features: form related, morpho-lexical and thematic features. The latter type of features has not...

Full description

Bibliographic Details
Main Authors: Falk, Ingrid, Bernhard, Delphine, Gérard, Christophe
Other Authors: Linguistique, Langues et Parole (LILPA), Université de Strasbourg (UNISTRA), Logoscope, Contrat IDEX 2012 avec l'Université de Strasbourg, Logoscope
Format: Other/Unknown Material
Language:English
Published: HAL CCSD 2014
Subjects:
psy
Online Access:https://hal.inria.fr/hal-00959079/file/logo.pdf
https://hal.inria.fr/hal-00959079
Description
Summary:International audience In this paper we present a statistical machine learning approach to neologism detection going some way beyond the use of exclusion lists. We explore the impact of three groups of features: form related, morpho-lexical and thematic features. The latter type of features has not yet been used in this kind of application and represents a way to access the semantic context of new words. The results suggest that form related features are helpful at the overall classification task, while morpho-lexical and thematic features better single out true neologisms.