ACTIV-ES: a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions

DESCRIPTION : ACTIV-ES is a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions. Titles for each of these three countries were seeded from the Internet Movie Database, subtitle data for the hearing impaired was provided by Opensubtitles.org and was po...

Full description

Bibliographic Details
Main Author: Jerid Francom
Format: Other/Unknown Material
Language:unknown
Published: Zenodo 2018
Subjects:
Online Access:https://doi.org/10.5281/zenodo.1492613
id ftzenodo:oai:zenodo.org:1492613
record_format openpolar
spelling ftzenodo:oai:zenodo.org:1492613 2024-09-15T18:14:11+00:00 ACTIV-ES: a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions Jerid Francom 2018-11-20 https://doi.org/10.5281/zenodo.1492613 unknown Zenodo https://github.com/francojc/activ-es/tree/activ-es-v.02 https://zenodo.org/communities/hispanic-linguistics https://zenodo.org/communities/linguistics https://doi.org/10.5281/zenodo.1492612 https://doi.org/10.5281/zenodo.1492613 oai:zenodo.org:1492613 info:eu-repo/semantics/openAccess GNU General Public License v2.0 only https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html corpus spanish film info:eu-repo/semantics/other 2018 ftzenodo https://doi.org/10.5281/zenodo.149261310.5281/zenodo.1492612 2024-07-26T15:20:02Z DESCRIPTION : ACTIV-ES is a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions. Titles for each of these three countries were seeded from the Internet Movie Database, subtitle data for the hearing impaired was provided by Opensubtitles.org and was post-processed to correct/remove subtitle, OCR and diacritic artifacts and annotated for part-of-speech. The data is available in two main formats: 1) running text for each document and 2) 1:5 gram aggregate files. Each format includes a plain text and part-of-speech annotated version. Document names reflect the language code, country, year, title, type, genre (first genre listed in the IMDb), and IMDb ID. For more information about the development and evaluation of these resources and to cite this work refer to: Francom, J., Hulden, M. and Ussishkin, A. (2014) ACTIV-ES: a comparable, cross-dialect corpus of 'everyday' Spanish from Argentina, Mexico, and Spain. In Proceedings of the Ninth Annual Language Resources and Evaluation Conference, Reykjavik, Iceland. European Language Resources Association (ELRA). In version .02 of the tagged running format corpus in the /eagles directory has been added which includes the EAGLES tagset. This tagset is much more fleshed out than the simplified tagset in the /tagged directory. For information on the tagset refer here: http://nlp.lsi.upc.edu/freeling/doc/tagsets/tagset-es.html . Other/Unknown Material Iceland Zenodo
institution Open Polar
collection Zenodo
op_collection_id ftzenodo
language unknown
topic corpus
spanish
film
spellingShingle corpus
spanish
film
Jerid Francom
ACTIV-ES: a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions
topic_facet corpus
spanish
film
description DESCRIPTION : ACTIV-ES is a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions. Titles for each of these three countries were seeded from the Internet Movie Database, subtitle data for the hearing impaired was provided by Opensubtitles.org and was post-processed to correct/remove subtitle, OCR and diacritic artifacts and annotated for part-of-speech. The data is available in two main formats: 1) running text for each document and 2) 1:5 gram aggregate files. Each format includes a plain text and part-of-speech annotated version. Document names reflect the language code, country, year, title, type, genre (first genre listed in the IMDb), and IMDb ID. For more information about the development and evaluation of these resources and to cite this work refer to: Francom, J., Hulden, M. and Ussishkin, A. (2014) ACTIV-ES: a comparable, cross-dialect corpus of 'everyday' Spanish from Argentina, Mexico, and Spain. In Proceedings of the Ninth Annual Language Resources and Evaluation Conference, Reykjavik, Iceland. European Language Resources Association (ELRA). In version .02 of the tagged running format corpus in the /eagles directory has been added which includes the EAGLES tagset. This tagset is much more fleshed out than the simplified tagset in the /tagged directory. For information on the tagset refer here: http://nlp.lsi.upc.edu/freeling/doc/tagsets/tagset-es.html .
format Other/Unknown Material
author Jerid Francom
author_facet Jerid Francom
author_sort Jerid Francom
title ACTIV-ES: a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions
title_short ACTIV-ES: a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions
title_full ACTIV-ES: a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions
title_fullStr ACTIV-ES: a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions
title_full_unstemmed ACTIV-ES: a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions
title_sort activ-es: a comparable spanish corpus comprised of film dialogue from argentine, mexican and spanish productions
publisher Zenodo
publishDate 2018
url https://doi.org/10.5281/zenodo.1492613
genre Iceland
genre_facet Iceland
op_relation https://github.com/francojc/activ-es/tree/activ-es-v.02
https://zenodo.org/communities/hispanic-linguistics
https://zenodo.org/communities/linguistics
https://doi.org/10.5281/zenodo.1492612
https://doi.org/10.5281/zenodo.1492613
oai:zenodo.org:1492613
op_rights info:eu-repo/semantics/openAccess
GNU General Public License v2.0 only
https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html
op_doi https://doi.org/10.5281/zenodo.149261310.5281/zenodo.1492612
_version_ 1810451964826222592