ACTIV-ES: a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions

DESCRIPTION : ACTIV-ES is a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions. Titles for each of these three countries were seeded from the Internet Movie Database, subtitle data for the hearing impaired was provided by Opensubtitles.org and was po...

Full description

Bibliographic Details
Main Author:	Jerid Francom
Format:	Dataset
Language:	unknown
Published:	Zenodo 2018
Subjects:	corpus spanish film Argentina Argentine Iceland
Online Access:	https://dx.doi.org/10.5281/zenodo.1492612 https://zenodo.org/record/1492612

id	ftdatacite:10.5281/zenodo.1492612
record_format	openpolar
spelling	ftdatacite:10.5281/zenodo.1492612 2023-05-15T16:51:03+02:00 ACTIV-ES: a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions Jerid Francom 2018 https://dx.doi.org/10.5281/zenodo.1492612 https://zenodo.org/record/1492612 unknown Zenodo https://github.com/francojc/activ-es/tree/activ-es-v.02 https://zenodo.org/communities/hispanic-linguistics https://zenodo.org/communities/linguistics https://github.com/francojc/activ-es/tree/activ-es-v.02 https://dx.doi.org/10.5281/zenodo.1492613 https://zenodo.org/communities/hispanic-linguistics https://zenodo.org/communities/linguistics Open Access GNU General Public License 2.0 http://www.opensource.org/licenses/GPL-2.0 info:eu-repo/semantics/openAccess GPL corpus spanish film dataset Dataset 2018 ftdatacite https://doi.org/10.5281/zenodo.1492612 https://doi.org/10.5281/zenodo.1492613 2021-11-05T12:55:41Z DESCRIPTION : ACTIV-ES is a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions. Titles for each of these three countries were seeded from the Internet Movie Database, subtitle data for the hearing impaired was provided by Opensubtitles.org and was post-processed to correct/remove subtitle, OCR and diacritic artifacts and annotated for part-of-speech. The data is available in two main formats: 1) running text for each document and 2) 1:5 gram aggregate files. Each format includes a plain text and part-of-speech annotated version. Document names reflect the language code, country, year, title, type, genre (first genre listed in the IMDb), and IMDb ID. For more information about the development and evaluation of these resources and to cite this work refer to: Francom, J., Hulden, M. and Ussishkin, A.. (2014) ACTIV-ES: a comparable, cross-dialect corpus of 'everyday' Spanish from Argentina, Mexico, and Spain. In Proceedings of the Ninth Annual Language Resources and Evaluation Conference, Reykjavik, Iceland. European Language Resources Association (ELRA). In version .02 of the tagged running format corpus in the /eagles directory has been added which includes the EAGLES tagset. This tagset is much more fleshed out than the simplified tagset in the /tagged directory. For information on the tagset refer here: http://nlp.lsi.upc.edu/freeling/doc/tagsets/tagset-es.html. Dataset Iceland DataCite Metadata Store (German National Library of Science and Technology) Argentina Argentine
institution	Open Polar
collection	DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id	ftdatacite
language	unknown
topic	corpus spanish film
spellingShingle	corpus spanish film Jerid Francom ACTIV-ES: a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions
topic_facet	corpus spanish film
description	DESCRIPTION : ACTIV-ES is a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions. Titles for each of these three countries were seeded from the Internet Movie Database, subtitle data for the hearing impaired was provided by Opensubtitles.org and was post-processed to correct/remove subtitle, OCR and diacritic artifacts and annotated for part-of-speech. The data is available in two main formats: 1) running text for each document and 2) 1:5 gram aggregate files. Each format includes a plain text and part-of-speech annotated version. Document names reflect the language code, country, year, title, type, genre (first genre listed in the IMDb), and IMDb ID. For more information about the development and evaluation of these resources and to cite this work refer to: Francom, J., Hulden, M. and Ussishkin, A.. (2014) ACTIV-ES: a comparable, cross-dialect corpus of 'everyday' Spanish from Argentina, Mexico, and Spain. In Proceedings of the Ninth Annual Language Resources and Evaluation Conference, Reykjavik, Iceland. European Language Resources Association (ELRA). In version .02 of the tagged running format corpus in the /eagles directory has been added which includes the EAGLES tagset. This tagset is much more fleshed out than the simplified tagset in the /tagged directory. For information on the tagset refer here: http://nlp.lsi.upc.edu/freeling/doc/tagsets/tagset-es.html.
format	Dataset
author	Jerid Francom
author_facet	Jerid Francom
author_sort	Jerid Francom
title	ACTIV-ES: a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions
title_short	ACTIV-ES: a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions
title_full	ACTIV-ES: a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions
title_fullStr	ACTIV-ES: a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions
title_full_unstemmed	ACTIV-ES: a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions
title_sort	activ-es: a comparable spanish corpus comprised of film dialogue from argentine, mexican and spanish productions
publisher	Zenodo
publishDate	2018
url	https://dx.doi.org/10.5281/zenodo.1492612 https://zenodo.org/record/1492612
geographic	Argentina Argentine
geographic_facet	Argentina Argentine
genre	Iceland
genre_facet	Iceland
op_relation	https://github.com/francojc/activ-es/tree/activ-es-v.02 https://zenodo.org/communities/hispanic-linguistics https://zenodo.org/communities/linguistics https://github.com/francojc/activ-es/tree/activ-es-v.02 https://dx.doi.org/10.5281/zenodo.1492613 https://zenodo.org/communities/hispanic-linguistics https://zenodo.org/communities/linguistics
op_rights	Open Access GNU General Public License 2.0 http://www.opensource.org/licenses/GPL-2.0 info:eu-repo/semantics/openAccess
op_rightsnorm	GPL
op_doi	https://doi.org/10.5281/zenodo.1492612 https://doi.org/10.5281/zenodo.1492613
_version_	1766041166074085376

ACTIV-ES: a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions

Similar Items