TALC-Sef a Manually-revised POS-Tagged Literary Corpus in Serbian, English and French

International audience In this paper, we present a parallel literary corpus for Serbian, English and French, the TALC-sef corpus. The corpus includes a manually-revised pos-tagged reference Serbian corpus of over 150,000 words. The initial objective was to devise a reference parallel corpus in the t...

Full description

Bibliographic Details
Main Authors: Balvet, Antonio, Stosic, Dejan, Miletic, Aleksandra
Other Authors: Savoirs, Textes, Langage (STL) - UMR 8163 (STL), Université de Lille-Centre National de la Recherche Scientifique (CNRS), Cognition, Langues, Langage, Ergonomie (CLLE-ERSS), École pratique des hautes études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université Toulouse - Jean Jaurès (UT2J)-Université Bordeaux Montaigne-Centre National de la Recherche Scientifique (CNRS)
Format: Other/Unknown Material
Language:English
Published: HAL CCSD 2014
Subjects:
Online Access:https://halshs.archives-ouvertes.fr/halshs-01077767
id fttriple:oai:gotriple.eu:10670/1.in1qf5
record_format openpolar
spelling fttriple:oai:gotriple.eu:10670/1.in1qf5 2023-05-15T16:50:11+02:00 TALC-Sef a Manually-revised POS-Tagged Literary Corpus in Serbian, English and French Balvet, Antonio Stosic, Dejan Miletic, Aleksandra Savoirs, Textes, Langage (STL) - UMR 8163 (STL) Université de Lille-Centre National de la Recherche Scientifique (CNRS) Cognition, Langues, Langage, Ergonomie (CLLE-ERSS) École pratique des hautes études (EPHE) Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université Toulouse - Jean Jaurès (UT2J)-Université Bordeaux Montaigne-Centre National de la Recherche Scientifique (CNRS) Reykjavik, Iceland 2014-05-26 https://halshs.archives-ouvertes.fr/halshs-01077767 en eng HAL CCSD halshs-01077767 10670/1.in1qf5 https://halshs.archives-ouvertes.fr/halshs-01077767 undefined Hyper Article en Ligne - Sciences de l'Homme et de la Société Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) LREC 2014 LREC 2014, May 2014, Reykjavik, Iceland Multilinguality Part-of-Speech Tagging Aligned Corpora lang anthro-se Conference Output https://vocabularies.coar-repositories.org/resource_types/c_c94f/ 2014 fttriple 2023-01-22T18:50:47Z International audience In this paper, we present a parallel literary corpus for Serbian, English and French, the TALC-sef corpus. The corpus includes a manually-revised pos-tagged reference Serbian corpus of over 150,000 words. The initial objective was to devise a reference parallel corpus in the three languages, both for literary and linguistic studies. The French and English sub-corpora had been pos-tagged from the onset, using TreeTagger (Schmid, 1994), but the corpus lacked, until now, a tagged version of the Serbian sub-corpus. Here, we present the original parallel literary corpus, then we address issues related to pos-tagging a large collection of Serbian text: from the conception of an appropriate tagset for Serbian, to the choice of an automatic pos-tagger adapted to the task, and then to some quantitative and qualitative results. We then move on to a discussion of perspectives in the near future for further annotations of the whole parallel corpus. Other/Unknown Material Iceland Unknown
institution Open Polar
collection Unknown
op_collection_id fttriple
language English
topic Multilinguality
Part-of-Speech Tagging
Aligned Corpora
lang
anthro-se
spellingShingle Multilinguality
Part-of-Speech Tagging
Aligned Corpora
lang
anthro-se
Balvet, Antonio
Stosic, Dejan
Miletic, Aleksandra
TALC-Sef a Manually-revised POS-Tagged Literary Corpus in Serbian, English and French
topic_facet Multilinguality
Part-of-Speech Tagging
Aligned Corpora
lang
anthro-se
description International audience In this paper, we present a parallel literary corpus for Serbian, English and French, the TALC-sef corpus. The corpus includes a manually-revised pos-tagged reference Serbian corpus of over 150,000 words. The initial objective was to devise a reference parallel corpus in the three languages, both for literary and linguistic studies. The French and English sub-corpora had been pos-tagged from the onset, using TreeTagger (Schmid, 1994), but the corpus lacked, until now, a tagged version of the Serbian sub-corpus. Here, we present the original parallel literary corpus, then we address issues related to pos-tagging a large collection of Serbian text: from the conception of an appropriate tagset for Serbian, to the choice of an automatic pos-tagger adapted to the task, and then to some quantitative and qualitative results. We then move on to a discussion of perspectives in the near future for further annotations of the whole parallel corpus.
author2 Savoirs, Textes, Langage (STL) - UMR 8163 (STL)
Université de Lille-Centre National de la Recherche Scientifique (CNRS)
Cognition, Langues, Langage, Ergonomie (CLLE-ERSS)
École pratique des hautes études (EPHE)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université Toulouse - Jean Jaurès (UT2J)-Université Bordeaux Montaigne-Centre National de la Recherche Scientifique (CNRS)
format Other/Unknown Material
author Balvet, Antonio
Stosic, Dejan
Miletic, Aleksandra
author_facet Balvet, Antonio
Stosic, Dejan
Miletic, Aleksandra
author_sort Balvet, Antonio
title TALC-Sef a Manually-revised POS-Tagged Literary Corpus in Serbian, English and French
title_short TALC-Sef a Manually-revised POS-Tagged Literary Corpus in Serbian, English and French
title_full TALC-Sef a Manually-revised POS-Tagged Literary Corpus in Serbian, English and French
title_fullStr TALC-Sef a Manually-revised POS-Tagged Literary Corpus in Serbian, English and French
title_full_unstemmed TALC-Sef a Manually-revised POS-Tagged Literary Corpus in Serbian, English and French
title_sort talc-sef a manually-revised pos-tagged literary corpus in serbian, english and french
publisher HAL CCSD
publishDate 2014
url https://halshs.archives-ouvertes.fr/halshs-01077767
op_coverage Reykjavik, Iceland
genre Iceland
genre_facet Iceland
op_source Hyper Article en Ligne - Sciences de l'Homme et de la Société
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
LREC 2014
LREC 2014, May 2014, Reykjavik, Iceland
op_relation halshs-01077767
10670/1.in1qf5
https://halshs.archives-ouvertes.fr/halshs-01077767
op_rights undefined
_version_ 1766040352489209856