Towards Electronic SMS Dictionary Construction: An Alignment-based Approach

International audience In this paper, we propose a method for aligning text messages (entitled AlignSMS) in order to automatically build an SMS dictionary. An extract of 100 text messages from the 88milSMS corpus (Panckhurst el al., 2013, 2014) was used as an initial test. More than 90,000 authentic...

Full description

Bibliographic Details
Main Authors: Lopez, Cédric, Bestandji, Reda, Roche, Mathieu, Panckhurst, Rachel
Other Authors: VISEO, Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), ADVanced Analytics for data SciencE (ADVANSE), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Territoires, Environnement, Télédétection et Information Spatiale (UMR TETIS), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-AgroParisTech-Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture (IRSTEA), Praxiling (Praxiling), Université Paul-Valéry - Montpellier 3 (UPVM)-Centre National de la Recherche Scientifique (CNRS)
Format: Conference Object
Language:English
Published: HAL CCSD 2014
Subjects:
SMS
Online Access:https://hal-lirmm.ccsd.cnrs.fr/lirmm-01054899
https://hal-lirmm.ccsd.cnrs.fr/lirmm-01054899/document
https://hal-lirmm.ccsd.cnrs.fr/lirmm-01054899/file/753_Paper.pdf
id ftagroparistech:oai:HAL:lirmm-01054899v1
record_format openpolar
spelling ftagroparistech:oai:HAL:lirmm-01054899v1 2024-05-19T07:42:48+00:00 Towards Electronic SMS Dictionary Construction: An Alignment-based Approach Lopez, Cédric Bestandji, Reda Roche, Mathieu Panckhurst, Rachel VISEO Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM) Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS) ADVanced Analytics for data SciencE (ADVANSE) Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS) Territoires, Environnement, Télédétection et Information Spatiale (UMR TETIS) Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-AgroParisTech-Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture (IRSTEA) Praxiling (Praxiling) Université Paul-Valéry - Montpellier 3 (UPVM)-Centre National de la Recherche Scientifique (CNRS) Reykjavik, Iceland 2014-05-26 https://hal-lirmm.ccsd.cnrs.fr/lirmm-01054899 https://hal-lirmm.ccsd.cnrs.fr/lirmm-01054899/document https://hal-lirmm.ccsd.cnrs.fr/lirmm-01054899/file/753_Paper.pdf en eng HAL CCSD lirmm-01054899 https://hal-lirmm.ccsd.cnrs.fr/lirmm-01054899 https://hal-lirmm.ccsd.cnrs.fr/lirmm-01054899/document https://hal-lirmm.ccsd.cnrs.fr/lirmm-01054899/file/753_Paper.pdf info:eu-repo/semantics/OpenAccess 9th International Conference on Language Resources and Evaluation LREC: Language Resources and Evaluation Conference https://hal-lirmm.ccsd.cnrs.fr/lirmm-01054899 LREC: Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland. pp.2833-2838 http://lrec2014.lrec-conf.org/en/ electronic dictionaries alignment SMS [SPI.OTHER]Engineering Sciences [physics]/Other [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing info:eu-repo/semantics/conferenceObject Conference papers 2014 ftagroparistech 2024-05-02T00:06:01Z International audience In this paper, we propose a method for aligning text messages (entitled AlignSMS) in order to automatically build an SMS dictionary. An extract of 100 text messages from the 88milSMS corpus (Panckhurst el al., 2013, 2014) was used as an initial test. More than 90,000 authentic text messages in French were collected from the general public by a group of academics in the south of France in the context of the sud4science project (http://www.sud4science.org). This project is itself part of a vast international SMS data collection project, entitled sms4science (http://www.sms4science.org, Fairon et al. 2006, Cougnon, 2014). After corpus collation, pre-processing and anonymisation (Accorsi et al., 2012, Patel et al., 2013), we discuss how "raw" anonymised text messages can be transcoded into normalised text messages, using a statistical alignment method. The future objective is to set up a hybrid (symbolic/statistic) approach based on both grammar rules and our statistical AlignSMS method. Conference Object Iceland AgroParisTech: HAL (Institut des sciences et industries du vivant et de l'environnement)
institution Open Polar
collection AgroParisTech: HAL (Institut des sciences et industries du vivant et de l'environnement)
op_collection_id ftagroparistech
language English
topic electronic dictionaries
alignment
SMS
[SPI.OTHER]Engineering Sciences [physics]/Other
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
spellingShingle electronic dictionaries
alignment
SMS
[SPI.OTHER]Engineering Sciences [physics]/Other
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
Lopez, Cédric
Bestandji, Reda
Roche, Mathieu
Panckhurst, Rachel
Towards Electronic SMS Dictionary Construction: An Alignment-based Approach
topic_facet electronic dictionaries
alignment
SMS
[SPI.OTHER]Engineering Sciences [physics]/Other
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
description International audience In this paper, we propose a method for aligning text messages (entitled AlignSMS) in order to automatically build an SMS dictionary. An extract of 100 text messages from the 88milSMS corpus (Panckhurst el al., 2013, 2014) was used as an initial test. More than 90,000 authentic text messages in French were collected from the general public by a group of academics in the south of France in the context of the sud4science project (http://www.sud4science.org). This project is itself part of a vast international SMS data collection project, entitled sms4science (http://www.sms4science.org, Fairon et al. 2006, Cougnon, 2014). After corpus collation, pre-processing and anonymisation (Accorsi et al., 2012, Patel et al., 2013), we discuss how "raw" anonymised text messages can be transcoded into normalised text messages, using a statistical alignment method. The future objective is to set up a hybrid (symbolic/statistic) approach based on both grammar rules and our statistical AlignSMS method.
author2 VISEO
Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM)
Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)
ADVanced Analytics for data SciencE (ADVANSE)
Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)
Territoires, Environnement, Télédétection et Information Spatiale (UMR TETIS)
Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-AgroParisTech-Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture (IRSTEA)
Praxiling (Praxiling)
Université Paul-Valéry - Montpellier 3 (UPVM)-Centre National de la Recherche Scientifique (CNRS)
format Conference Object
author Lopez, Cédric
Bestandji, Reda
Roche, Mathieu
Panckhurst, Rachel
author_facet Lopez, Cédric
Bestandji, Reda
Roche, Mathieu
Panckhurst, Rachel
author_sort Lopez, Cédric
title Towards Electronic SMS Dictionary Construction: An Alignment-based Approach
title_short Towards Electronic SMS Dictionary Construction: An Alignment-based Approach
title_full Towards Electronic SMS Dictionary Construction: An Alignment-based Approach
title_fullStr Towards Electronic SMS Dictionary Construction: An Alignment-based Approach
title_full_unstemmed Towards Electronic SMS Dictionary Construction: An Alignment-based Approach
title_sort towards electronic sms dictionary construction: an alignment-based approach
publisher HAL CCSD
publishDate 2014
url https://hal-lirmm.ccsd.cnrs.fr/lirmm-01054899
https://hal-lirmm.ccsd.cnrs.fr/lirmm-01054899/document
https://hal-lirmm.ccsd.cnrs.fr/lirmm-01054899/file/753_Paper.pdf
op_coverage Reykjavik, Iceland
genre Iceland
genre_facet Iceland
op_source 9th International Conference on Language Resources and Evaluation
LREC: Language Resources and Evaluation Conference
https://hal-lirmm.ccsd.cnrs.fr/lirmm-01054899
LREC: Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland. pp.2833-2838
http://lrec2014.lrec-conf.org/en/
op_relation lirmm-01054899
https://hal-lirmm.ccsd.cnrs.fr/lirmm-01054899
https://hal-lirmm.ccsd.cnrs.fr/lirmm-01054899/document
https://hal-lirmm.ccsd.cnrs.fr/lirmm-01054899/file/753_Paper.pdf
op_rights info:eu-repo/semantics/OpenAccess
_version_ 1799482507353128960