Building and Modelling Multilingual Subjective Corpora
International audience Building multilingual opinionated models requires multilingual corpora annotated with opinion labels. Unfortunately, such kind of corpora are rare. We consider opinions in this work as subjective or objective. In this paper, we introduce an annotation method that can be reliab...
Main Authors: | , , |
---|---|
Other Authors: | , , , , |
Format: | Conference Object |
Language: | English |
Published: |
HAL CCSD
2014
|
Subjects: | |
Online Access: | https://hal.inria.fr/hal-00995755 https://hal.inria.fr/hal-00995755/document https://hal.inria.fr/hal-00995755/file/LREC2014Smaili.pdf |
id |
ftccsdartic:oai:HAL:hal-00995755v1 |
---|---|
record_format |
openpolar |
spelling |
ftccsdartic:oai:HAL:hal-00995755v1 2023-05-15T16:47:35+02:00 Building and Modelling Multilingual Subjective Corpora Saad, Motaz Langlois, David Smaïli, Kamel Statistical Machine Translation and Speech Modelization and Text (SMarT) Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD) Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA) Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA) Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria) Reykjavik, Iceland, Iceland 2014-05-26 https://hal.inria.fr/hal-00995755 https://hal.inria.fr/hal-00995755/document https://hal.inria.fr/hal-00995755/file/LREC2014Smaili.pdf en eng HAL CCSD European Language Resources Association (ELRA) hal-00995755 https://hal.inria.fr/hal-00995755 https://hal.inria.fr/hal-00995755/document https://hal.inria.fr/hal-00995755/file/LREC2014Smaili.pdf info:eu-repo/semantics/OpenAccess Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) https://hal.inria.fr/hal-00995755 Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), May 2014, Reykjavik, Iceland, Iceland subjectivity analysis cross-lingual annotation language modelling [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing info:eu-repo/semantics/conferenceObject Conference papers 2014 ftccsdartic 2021-10-17T01:01:10Z International audience Building multilingual opinionated models requires multilingual corpora annotated with opinion labels. Unfortunately, such kind of corpora are rare. We consider opinions in this work as subjective or objective. In this paper, we introduce an annotation method that can be reliably transferred across topic domains and across languages. The method starts by building a classifier that annotates sentences into subjective/objective label using a training data from "movie reviews" domain which is in English language. The annotation can be transferred to another language by classifying English sentences in parallel corpora and transferring the same annotation to the same sentences of the other language. We also shed the light on the link between opinion mining and statistical language modelling, and how such corpora are useful for domain specific language modelling. We show the distinction between subjective and objective sentences which tends to be stable across domains and languages. Our experiments show that language models trained on objective (respectively subjective) corpus lead to better perplexities on objective (respectively subjective) test. Conference Object Iceland Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe) |
institution |
Open Polar |
collection |
Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe) |
op_collection_id |
ftccsdartic |
language |
English |
topic |
subjectivity analysis cross-lingual annotation language modelling [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing |
spellingShingle |
subjectivity analysis cross-lingual annotation language modelling [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing Saad, Motaz Langlois, David Smaïli, Kamel Building and Modelling Multilingual Subjective Corpora |
topic_facet |
subjectivity analysis cross-lingual annotation language modelling [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing |
description |
International audience Building multilingual opinionated models requires multilingual corpora annotated with opinion labels. Unfortunately, such kind of corpora are rare. We consider opinions in this work as subjective or objective. In this paper, we introduce an annotation method that can be reliably transferred across topic domains and across languages. The method starts by building a classifier that annotates sentences into subjective/objective label using a training data from "movie reviews" domain which is in English language. The annotation can be transferred to another language by classifying English sentences in parallel corpora and transferring the same annotation to the same sentences of the other language. We also shed the light on the link between opinion mining and statistical language modelling, and how such corpora are useful for domain specific language modelling. We show the distinction between subjective and objective sentences which tends to be stable across domains and languages. Our experiments show that language models trained on objective (respectively subjective) corpus lead to better perplexities on objective (respectively subjective) test. |
author2 |
Statistical Machine Translation and Speech Modelization and Text (SMarT) Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD) Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA) Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA) Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria) |
format |
Conference Object |
author |
Saad, Motaz Langlois, David Smaïli, Kamel |
author_facet |
Saad, Motaz Langlois, David Smaïli, Kamel |
author_sort |
Saad, Motaz |
title |
Building and Modelling Multilingual Subjective Corpora |
title_short |
Building and Modelling Multilingual Subjective Corpora |
title_full |
Building and Modelling Multilingual Subjective Corpora |
title_fullStr |
Building and Modelling Multilingual Subjective Corpora |
title_full_unstemmed |
Building and Modelling Multilingual Subjective Corpora |
title_sort |
building and modelling multilingual subjective corpora |
publisher |
HAL CCSD |
publishDate |
2014 |
url |
https://hal.inria.fr/hal-00995755 https://hal.inria.fr/hal-00995755/document https://hal.inria.fr/hal-00995755/file/LREC2014Smaili.pdf |
op_coverage |
Reykjavik, Iceland, Iceland |
genre |
Iceland |
genre_facet |
Iceland |
op_source |
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) https://hal.inria.fr/hal-00995755 Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), May 2014, Reykjavik, Iceland, Iceland |
op_relation |
hal-00995755 https://hal.inria.fr/hal-00995755 https://hal.inria.fr/hal-00995755/document https://hal.inria.fr/hal-00995755/file/LREC2014Smaili.pdf |
op_rights |
info:eu-repo/semantics/OpenAccess |
_version_ |
1766037681209344000 |