Boosting the creation of a treebank

Comunicació presentada al 9th International Conference on Language Resources and Evaluation (LREC'14), celebrat del 26 al 31 de maig de 2014 a Reykjavík, Islàndia. We present the results of the experiment of bootstrapping a Treebank for Catalan by using a Dependency Parser trained with Spanish...

Full description

Bibliographic Details
Main Authors: Arias Badia, Blanca, Bel Rafecas, Núria, Fomicheva, Marina, Larrea Mendizabal, Imanol, Lorente, Mercè, Marimon, Montserrat, Milà-Garcia, Alba, Vivaldi, J. (Jorge), 1952-, Padró, Muntsa
Format: Conference Object
Language:English
Published: ELRA (European Language Resources Association)
Subjects:
Online Access:http://hdl.handle.net/10230/46232
id ftupompeufabra:oai:repositori.upf.edu:10230/46232
record_format openpolar
spelling ftupompeufabra:oai:repositori.upf.edu:10230/46232 2023-05-15T18:06:59+02:00 Boosting the creation of a treebank Arias Badia, Blanca Bel Rafecas, Núria Fomicheva, Marina Larrea Mendizabal, Imanol Lorente, Mercè Marimon, Montserrat Milà-Garcia, Alba Vivaldi, J. (Jorge), 1952- Padró, Muntsa application/pdf http://hdl.handle.net/10230/46232 eng eng ELRA (European Language Resources Association) Calzolari N, Choukri K, Declerck T, Loftsson H, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S, editors. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC'14); 2014 May 26-31; Reykjavik, Iceland. Paris: European Language Resources Association (ELRA); 2014. p. 775-81 info:eu-repo/grantAgreement/ES/3PN/TIN2012-38584-C06-05 Arias B, Bel N, Lorente M, Marimón M, Milà A, Vivaldi J, Padró M, Fomicheva M, Larrea I. Boosting the creation of a treebank. In: Calzolari N, Choukri K, Declerck T, Loftsson H, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S, editors. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC'14); 2014 May 26-31; Reykjavik, Iceland. Paris: European Language Resources Association (ELRA); 2014. p. 775-81. http://hdl.handle.net/10230/46232 Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License (https://creativecommons.org/licenses/by-nc-sa/3.0/) https://creativecommons.org/licenses/by-nc-sa/3.0/ info:eu-repo/semantics/openAccess CC-BY-NC-SA Dependency treebank Treebank bootstrapping Less resourced languages info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion ftupompeufabra 2021-08-03T23:19:58Z Comunicació presentada al 9th International Conference on Language Resources and Evaluation (LREC'14), celebrat del 26 al 31 de maig de 2014 a Reykjavík, Islàndia. We present the results of the experiment of bootstrapping a Treebank for Catalan by using a Dependency Parser trained with Spanish sentences. In order to save time and cost, our approach was to profit from the typological similarities between Catalan and Spanish to create a first Catalan data set quickly by (i) automatically annotating with a delexicalized Spanish parser, (ii) manually correcting the parses, and (iii) using the Catalan corrected sentences to train a Catalan parser. The results showed that the number of parsed sentences required to train a Catalan parser is about 1000, which were achieved in 4 months with 2 annotators. This work was partially supported by the SKATER project (Ministerio de Economía y Competitividad, TIN2012-38584-C06-05). Conference Object Reykjavík Reykjavík UPF Digital Repository (Universitat Pompeu Fabra, Barcelona) Reykjavík
institution Open Polar
collection UPF Digital Repository (Universitat Pompeu Fabra, Barcelona)
op_collection_id ftupompeufabra
language English
topic Dependency treebank
Treebank bootstrapping
Less resourced languages
spellingShingle Dependency treebank
Treebank bootstrapping
Less resourced languages
Arias Badia, Blanca
Bel Rafecas, Núria
Fomicheva, Marina
Larrea Mendizabal, Imanol
Lorente, Mercè
Marimon, Montserrat
Milà-Garcia, Alba
Vivaldi, J. (Jorge), 1952-
Padró, Muntsa
Boosting the creation of a treebank
topic_facet Dependency treebank
Treebank bootstrapping
Less resourced languages
description Comunicació presentada al 9th International Conference on Language Resources and Evaluation (LREC'14), celebrat del 26 al 31 de maig de 2014 a Reykjavík, Islàndia. We present the results of the experiment of bootstrapping a Treebank for Catalan by using a Dependency Parser trained with Spanish sentences. In order to save time and cost, our approach was to profit from the typological similarities between Catalan and Spanish to create a first Catalan data set quickly by (i) automatically annotating with a delexicalized Spanish parser, (ii) manually correcting the parses, and (iii) using the Catalan corrected sentences to train a Catalan parser. The results showed that the number of parsed sentences required to train a Catalan parser is about 1000, which were achieved in 4 months with 2 annotators. This work was partially supported by the SKATER project (Ministerio de Economía y Competitividad, TIN2012-38584-C06-05).
format Conference Object
author Arias Badia, Blanca
Bel Rafecas, Núria
Fomicheva, Marina
Larrea Mendizabal, Imanol
Lorente, Mercè
Marimon, Montserrat
Milà-Garcia, Alba
Vivaldi, J. (Jorge), 1952-
Padró, Muntsa
author_facet Arias Badia, Blanca
Bel Rafecas, Núria
Fomicheva, Marina
Larrea Mendizabal, Imanol
Lorente, Mercè
Marimon, Montserrat
Milà-Garcia, Alba
Vivaldi, J. (Jorge), 1952-
Padró, Muntsa
author_sort Arias Badia, Blanca
title Boosting the creation of a treebank
title_short Boosting the creation of a treebank
title_full Boosting the creation of a treebank
title_fullStr Boosting the creation of a treebank
title_full_unstemmed Boosting the creation of a treebank
title_sort boosting the creation of a treebank
publisher ELRA (European Language Resources Association)
url http://hdl.handle.net/10230/46232
geographic Reykjavík
geographic_facet Reykjavík
genre Reykjavík
Reykjavík
genre_facet Reykjavík
Reykjavík
op_relation Calzolari N, Choukri K, Declerck T, Loftsson H, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S, editors. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC'14); 2014 May 26-31; Reykjavik, Iceland. Paris: European Language Resources Association (ELRA); 2014. p. 775-81
info:eu-repo/grantAgreement/ES/3PN/TIN2012-38584-C06-05
Arias B, Bel N, Lorente M, Marimón M, Milà A, Vivaldi J, Padró M, Fomicheva M, Larrea I. Boosting the creation of a treebank. In: Calzolari N, Choukri K, Declerck T, Loftsson H, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S, editors. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC'14); 2014 May 26-31; Reykjavik, Iceland. Paris: European Language Resources Association (ELRA); 2014. p. 775-81.
http://hdl.handle.net/10230/46232
op_rights Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License (https://creativecommons.org/licenses/by-nc-sa/3.0/)
https://creativecommons.org/licenses/by-nc-sa/3.0/
info:eu-repo/semantics/openAccess
op_rightsnorm CC-BY-NC-SA
_version_ 1766178756771184640