Building and Modelling Multilingual Subjective Corpora

International audience Building multilingual opinionated models requires multilingual corpora annotated with opinion labels. Unfortunately, such kind of corpora are rare. We consider opinions in this work as subjective or objective. In this paper, we introduce an annotation method that can be reliab...

Full description

Bibliographic Details
Main Authors: Saad, Motaz, Langlois, David, Smaïli, Kamel
Other Authors: Statistical Machine Translation and Speech Modelization and Text (SMarT), Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Format: Conference Object
Language:English
Published: HAL CCSD 2014
Subjects:
Online Access:https://inria.hal.science/hal-00995755
https://inria.hal.science/hal-00995755/document
https://inria.hal.science/hal-00995755/file/LREC2014Smaili.pdf
id ftunilorrainehal:oai:HAL:hal-00995755v1
record_format openpolar
spelling ftunilorrainehal:oai:HAL:hal-00995755v1 2023-10-09T21:52:39+02:00 Building and Modelling Multilingual Subjective Corpora Saad, Motaz Langlois, David Smaïli, Kamel Statistical Machine Translation and Speech Modelization and Text (SMarT) Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD) Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA) Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA) Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS) Reykjavik, Iceland, Iceland 2014-05-26 https://inria.hal.science/hal-00995755 https://inria.hal.science/hal-00995755/document https://inria.hal.science/hal-00995755/file/LREC2014Smaili.pdf en eng HAL CCSD European Language Resources Association (ELRA) hal-00995755 https://inria.hal.science/hal-00995755 https://inria.hal.science/hal-00995755/document https://inria.hal.science/hal-00995755/file/LREC2014Smaili.pdf info:eu-repo/semantics/OpenAccess Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) https://inria.hal.science/hal-00995755 Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), May 2014, Reykjavik, Iceland, Iceland subjectivity analysis cross-lingual annotation language modelling [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing info:eu-repo/semantics/conferenceObject Conference papers 2014 ftunilorrainehal 2023-09-12T23:24:50Z International audience Building multilingual opinionated models requires multilingual corpora annotated with opinion labels. Unfortunately, such kind of corpora are rare. We consider opinions in this work as subjective or objective. In this paper, we introduce an annotation method that can be reliably transferred across topic domains and across languages. The method starts by building a classifier that annotates sentences into subjective/objective label using a training data from "movie reviews" domain which is in English language. The annotation can be transferred to another language by classifying English sentences in parallel corpora and transferring the same annotation to the same sentences of the other language. We also shed the light on the link between opinion mining and statistical language modelling, and how such corpora are useful for domain specific language modelling. We show the distinction between subjective and objective sentences which tends to be stable across domains and languages. Our experiments show that language models trained on objective (respectively subjective) corpus lead to better perplexities on objective (respectively subjective) test. Conference Object Iceland Université de Lorraine: HAL
institution Open Polar
collection Université de Lorraine: HAL
op_collection_id ftunilorrainehal
language English
topic subjectivity analysis
cross-lingual annotation
language modelling
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
spellingShingle subjectivity analysis
cross-lingual annotation
language modelling
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
Saad, Motaz
Langlois, David
Smaïli, Kamel
Building and Modelling Multilingual Subjective Corpora
topic_facet subjectivity analysis
cross-lingual annotation
language modelling
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
description International audience Building multilingual opinionated models requires multilingual corpora annotated with opinion labels. Unfortunately, such kind of corpora are rare. We consider opinions in this work as subjective or objective. In this paper, we introduce an annotation method that can be reliably transferred across topic domains and across languages. The method starts by building a classifier that annotates sentences into subjective/objective label using a training data from "movie reviews" domain which is in English language. The annotation can be transferred to another language by classifying English sentences in parallel corpora and transferring the same annotation to the same sentences of the other language. We also shed the light on the link between opinion mining and statistical language modelling, and how such corpora are useful for domain specific language modelling. We show the distinction between subjective and objective sentences which tends to be stable across domains and languages. Our experiments show that language models trained on objective (respectively subjective) corpus lead to better perplexities on objective (respectively subjective) test.
author2 Statistical Machine Translation and Speech Modelization and Text (SMarT)
Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD)
Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
format Conference Object
author Saad, Motaz
Langlois, David
Smaïli, Kamel
author_facet Saad, Motaz
Langlois, David
Smaïli, Kamel
author_sort Saad, Motaz
title Building and Modelling Multilingual Subjective Corpora
title_short Building and Modelling Multilingual Subjective Corpora
title_full Building and Modelling Multilingual Subjective Corpora
title_fullStr Building and Modelling Multilingual Subjective Corpora
title_full_unstemmed Building and Modelling Multilingual Subjective Corpora
title_sort building and modelling multilingual subjective corpora
publisher HAL CCSD
publishDate 2014
url https://inria.hal.science/hal-00995755
https://inria.hal.science/hal-00995755/document
https://inria.hal.science/hal-00995755/file/LREC2014Smaili.pdf
op_coverage Reykjavik, Iceland, Iceland
genre Iceland
genre_facet Iceland
op_source Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
https://inria.hal.science/hal-00995755
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), May 2014, Reykjavik, Iceland, Iceland
op_relation hal-00995755
https://inria.hal.science/hal-00995755
https://inria.hal.science/hal-00995755/document
https://inria.hal.science/hal-00995755/file/LREC2014Smaili.pdf
op_rights info:eu-repo/semantics/OpenAccess
_version_ 1779315817808134144