How to exploit paralinguistic features to identify acronyms in texts?

International audience This paper addresses the issue of acronym dictionary building. The first step of the process identifies acronym/definition candidates, the second one selects candidates based on a letter alignment method. This approach has two advantages because it enables (1) to annotate docu...

Full description

Bibliographic Details
Main Author: Roche, Mathieu
Other Authors: ADVanced Analytics for data SciencE (ADVANSE), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Territoires, Environnement, Télédétection et Information Spatiale (UMR TETIS), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-AgroParisTech-Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture (IRSTEA), ANR-12-JS02-0010,SIFR,Indexation sémantique de ressources biomédicales francophones(2012)
Format: Conference Object
Language:English
Published: HAL CCSD 2014
Subjects:
Online Access:https://hal-lirmm.ccsd.cnrs.fr/lirmm-00974797
https://hal-lirmm.ccsd.cnrs.fr/lirmm-00974797/document
https://hal-lirmm.ccsd.cnrs.fr/lirmm-00974797/file/identification_Acronyms.pdf
id ftlirmm:oai:HAL:lirmm-00974797v1
record_format openpolar
spelling ftlirmm:oai:HAL:lirmm-00974797v1 2023-11-05T03:42:53+01:00 How to exploit paralinguistic features to identify acronyms in texts? Roche, Mathieu ADVanced Analytics for data SciencE (ADVANSE) Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM) Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS) Territoires, Environnement, Télédétection et Information Spatiale (UMR TETIS) Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-AgroParisTech-Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture (IRSTEA) ANR-12-JS02-0010,SIFR,Indexation sémantique de ressources biomédicales francophones(2012) Reykjavik, Iceland 2014-05-26 https://hal-lirmm.ccsd.cnrs.fr/lirmm-00974797 https://hal-lirmm.ccsd.cnrs.fr/lirmm-00974797/document https://hal-lirmm.ccsd.cnrs.fr/lirmm-00974797/file/identification_Acronyms.pdf en eng HAL CCSD lirmm-00974797 https://hal-lirmm.ccsd.cnrs.fr/lirmm-00974797 https://hal-lirmm.ccsd.cnrs.fr/lirmm-00974797/document https://hal-lirmm.ccsd.cnrs.fr/lirmm-00974797/file/identification_Acronyms.pdf info:eu-repo/semantics/OpenAccess 9th International Conference on Language Resources and Evaluation LREC: Language Resources and Evaluation Conference https://hal-lirmm.ccsd.cnrs.fr/lirmm-00974797 LREC: Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland. pp.69-72 http://www.lrec-conf.org/proceedings/lrec2014/index.html Acronym expansion Text mining [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing [INFO.INFO-WB]Computer Science [cs]/Web [SPI.OTHER]Engineering Sciences [physics]/Other info:eu-repo/semantics/conferenceObject Conference papers 2014 ftlirmm 2023-10-10T22:38:06Z International audience This paper addresses the issue of acronym dictionary building. The first step of the process identifies acronym/definition candidates, the second one selects candidates based on a letter alignment method. This approach has two advantages because it enables (1) to annotate documents, (2) to build specific dictionaries. More precisely, this paper discusses the use of a specific linguistic concept, the gloss, in order to identify candidates. The proposed method based on paralinguistic markers is independent of languages. Conference Object Iceland LIRMM: HAL (Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier)
institution Open Polar
collection LIRMM: HAL (Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier)
op_collection_id ftlirmm
language English
topic Acronym expansion
Text mining
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
[INFO.INFO-WB]Computer Science [cs]/Web
[SPI.OTHER]Engineering Sciences [physics]/Other
spellingShingle Acronym expansion
Text mining
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
[INFO.INFO-WB]Computer Science [cs]/Web
[SPI.OTHER]Engineering Sciences [physics]/Other
Roche, Mathieu
How to exploit paralinguistic features to identify acronyms in texts?
topic_facet Acronym expansion
Text mining
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
[INFO.INFO-WB]Computer Science [cs]/Web
[SPI.OTHER]Engineering Sciences [physics]/Other
description International audience This paper addresses the issue of acronym dictionary building. The first step of the process identifies acronym/definition candidates, the second one selects candidates based on a letter alignment method. This approach has two advantages because it enables (1) to annotate documents, (2) to build specific dictionaries. More precisely, this paper discusses the use of a specific linguistic concept, the gloss, in order to identify candidates. The proposed method based on paralinguistic markers is independent of languages.
author2 ADVanced Analytics for data SciencE (ADVANSE)
Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM)
Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)
Territoires, Environnement, Télédétection et Information Spatiale (UMR TETIS)
Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-AgroParisTech-Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture (IRSTEA)
ANR-12-JS02-0010,SIFR,Indexation sémantique de ressources biomédicales francophones(2012)
format Conference Object
author Roche, Mathieu
author_facet Roche, Mathieu
author_sort Roche, Mathieu
title How to exploit paralinguistic features to identify acronyms in texts?
title_short How to exploit paralinguistic features to identify acronyms in texts?
title_full How to exploit paralinguistic features to identify acronyms in texts?
title_fullStr How to exploit paralinguistic features to identify acronyms in texts?
title_full_unstemmed How to exploit paralinguistic features to identify acronyms in texts?
title_sort how to exploit paralinguistic features to identify acronyms in texts?
publisher HAL CCSD
publishDate 2014
url https://hal-lirmm.ccsd.cnrs.fr/lirmm-00974797
https://hal-lirmm.ccsd.cnrs.fr/lirmm-00974797/document
https://hal-lirmm.ccsd.cnrs.fr/lirmm-00974797/file/identification_Acronyms.pdf
op_coverage Reykjavik, Iceland
genre Iceland
genre_facet Iceland
op_source 9th International Conference on Language Resources and Evaluation
LREC: Language Resources and Evaluation Conference
https://hal-lirmm.ccsd.cnrs.fr/lirmm-00974797
LREC: Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland. pp.69-72
http://www.lrec-conf.org/proceedings/lrec2014/index.html
op_relation lirmm-00974797
https://hal-lirmm.ccsd.cnrs.fr/lirmm-00974797
https://hal-lirmm.ccsd.cnrs.fr/lirmm-00974797/document
https://hal-lirmm.ccsd.cnrs.fr/lirmm-00974797/file/identification_Acronyms.pdf
op_rights info:eu-repo/semantics/OpenAccess
_version_ 1781700461884080128