From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers
International audience In this paper we present a statistical machine learning approach to neologism detection going some way beyond the use of exclusion lists. We explore the impact of three groups of features: form related, morpho-lexical and thematic features. The latter type of features has not...
Main Authors: | , , |
---|---|
Other Authors: | , , , |
Format: | Conference Object |
Language: | English |
Published: |
HAL CCSD
2014
|
Subjects: | |
Online Access: | https://hal.inria.fr/hal-00959079 https://hal.inria.fr/hal-00959079/document https://hal.inria.fr/hal-00959079/file/logo.pdf |
id |
ftccsdartic:oai:HAL:hal-00959079v1 |
---|---|
record_format |
openpolar |
spelling |
ftccsdartic:oai:HAL:hal-00959079v1 2023-05-15T16:48:37+02:00 From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers Falk, Ingrid Bernhard, Delphine Gérard, Christophe Linguistique, Langues et Parole (LILPA) Université de Strasbourg (UNISTRA) Logoscope, Contrat IDEX 2012 avec l'Université de Strasbourg Logoscope Reykjavik, Iceland 2014-05-27 https://hal.inria.fr/hal-00959079 https://hal.inria.fr/hal-00959079/document https://hal.inria.fr/hal-00959079/file/logo.pdf en eng HAL CCSD hal-00959079 https://hal.inria.fr/hal-00959079 https://hal.inria.fr/hal-00959079/document https://hal.inria.fr/hal-00959079/file/logo.pdf info:eu-repo/semantics/OpenAccess LREC - The 9th edition of the Language Resources and Evaluation Conference https://hal.inria.fr/hal-00959079 LREC - The 9th edition of the Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland [SCCO.LING]Cognitive science/Linguistics [SCCO.COMP]Cognitive science/Computer science [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing info:eu-repo/semantics/conferenceObject Conference papers 2014 ftccsdartic 2023-03-18T23:32:23Z International audience In this paper we present a statistical machine learning approach to neologism detection going some way beyond the use of exclusion lists. We explore the impact of three groups of features: form related, morpho-lexical and thematic features. The latter type of features has not yet been used in this kind of application and represents a way to access the semantic context of new words. The results suggest that form related features are helpful at the overall classification task, while morpho-lexical and thematic features better single out true neologisms. Conference Object Iceland Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe) |
institution |
Open Polar |
collection |
Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe) |
op_collection_id |
ftccsdartic |
language |
English |
topic |
[SCCO.LING]Cognitive science/Linguistics [SCCO.COMP]Cognitive science/Computer science [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing |
spellingShingle |
[SCCO.LING]Cognitive science/Linguistics [SCCO.COMP]Cognitive science/Computer science [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing Falk, Ingrid Bernhard, Delphine Gérard, Christophe From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers |
topic_facet |
[SCCO.LING]Cognitive science/Linguistics [SCCO.COMP]Cognitive science/Computer science [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing |
description |
International audience In this paper we present a statistical machine learning approach to neologism detection going some way beyond the use of exclusion lists. We explore the impact of three groups of features: form related, morpho-lexical and thematic features. The latter type of features has not yet been used in this kind of application and represents a way to access the semantic context of new words. The results suggest that form related features are helpful at the overall classification task, while morpho-lexical and thematic features better single out true neologisms. |
author2 |
Linguistique, Langues et Parole (LILPA) Université de Strasbourg (UNISTRA) Logoscope, Contrat IDEX 2012 avec l'Université de Strasbourg Logoscope |
format |
Conference Object |
author |
Falk, Ingrid Bernhard, Delphine Gérard, Christophe |
author_facet |
Falk, Ingrid Bernhard, Delphine Gérard, Christophe |
author_sort |
Falk, Ingrid |
title |
From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers |
title_short |
From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers |
title_full |
From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers |
title_fullStr |
From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers |
title_full_unstemmed |
From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers |
title_sort |
from non word to new word: automatically identifying neologisms in french newspapers |
publisher |
HAL CCSD |
publishDate |
2014 |
url |
https://hal.inria.fr/hal-00959079 https://hal.inria.fr/hal-00959079/document https://hal.inria.fr/hal-00959079/file/logo.pdf |
op_coverage |
Reykjavik, Iceland |
genre |
Iceland |
genre_facet |
Iceland |
op_source |
LREC - The 9th edition of the Language Resources and Evaluation Conference https://hal.inria.fr/hal-00959079 LREC - The 9th edition of the Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland |
op_relation |
hal-00959079 https://hal.inria.fr/hal-00959079 https://hal.inria.fr/hal-00959079/document https://hal.inria.fr/hal-00959079/file/logo.pdf |
op_rights |
info:eu-repo/semantics/OpenAccess |
_version_ |
1766038704114106368 |