Boosting the creation of a treebank
Comunicació presentada al 9th International Conference on Language Resources and Evaluation (LREC'14), celebrat del 26 al 31 de maig de 2014 a Reykjavík, Islàndia. We present the results of the experiment of bootstrapping a Treebank for Catalan by using a Dependency Parser trained with Spanish...
Main Authors: | , , , , , , , , |
---|---|
Format: | Conference Object |
Language: | English |
Published: |
ELRA (European Language Resources Association)
|
Subjects: | |
Online Access: | http://hdl.handle.net/10230/46232 |
id |
ftupompeufabra:oai:repositori.upf.edu:10230/46232 |
---|---|
record_format |
openpolar |
spelling |
ftupompeufabra:oai:repositori.upf.edu:10230/46232 2023-05-15T18:06:59+02:00 Boosting the creation of a treebank Arias Badia, Blanca Bel Rafecas, Núria Fomicheva, Marina Larrea Mendizabal, Imanol Lorente, Mercè Marimon, Montserrat Milà-Garcia, Alba Vivaldi, J. (Jorge), 1952- Padró, Muntsa application/pdf http://hdl.handle.net/10230/46232 eng eng ELRA (European Language Resources Association) Calzolari N, Choukri K, Declerck T, Loftsson H, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S, editors. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC'14); 2014 May 26-31; Reykjavik, Iceland. Paris: European Language Resources Association (ELRA); 2014. p. 775-81 info:eu-repo/grantAgreement/ES/3PN/TIN2012-38584-C06-05 Arias B, Bel N, Lorente M, Marimón M, Milà A, Vivaldi J, Padró M, Fomicheva M, Larrea I. Boosting the creation of a treebank. In: Calzolari N, Choukri K, Declerck T, Loftsson H, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S, editors. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC'14); 2014 May 26-31; Reykjavik, Iceland. Paris: European Language Resources Association (ELRA); 2014. p. 775-81. http://hdl.handle.net/10230/46232 Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License (https://creativecommons.org/licenses/by-nc-sa/3.0/) https://creativecommons.org/licenses/by-nc-sa/3.0/ info:eu-repo/semantics/openAccess CC-BY-NC-SA Dependency treebank Treebank bootstrapping Less resourced languages info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion ftupompeufabra 2021-08-03T23:19:58Z Comunicació presentada al 9th International Conference on Language Resources and Evaluation (LREC'14), celebrat del 26 al 31 de maig de 2014 a Reykjavík, Islàndia. We present the results of the experiment of bootstrapping a Treebank for Catalan by using a Dependency Parser trained with Spanish sentences. In order to save time and cost, our approach was to profit from the typological similarities between Catalan and Spanish to create a first Catalan data set quickly by (i) automatically annotating with a delexicalized Spanish parser, (ii) manually correcting the parses, and (iii) using the Catalan corrected sentences to train a Catalan parser. The results showed that the number of parsed sentences required to train a Catalan parser is about 1000, which were achieved in 4 months with 2 annotators. This work was partially supported by the SKATER project (Ministerio de Economía y Competitividad, TIN2012-38584-C06-05). Conference Object Reykjavík Reykjavík UPF Digital Repository (Universitat Pompeu Fabra, Barcelona) Reykjavík |
institution |
Open Polar |
collection |
UPF Digital Repository (Universitat Pompeu Fabra, Barcelona) |
op_collection_id |
ftupompeufabra |
language |
English |
topic |
Dependency treebank Treebank bootstrapping Less resourced languages |
spellingShingle |
Dependency treebank Treebank bootstrapping Less resourced languages Arias Badia, Blanca Bel Rafecas, Núria Fomicheva, Marina Larrea Mendizabal, Imanol Lorente, Mercè Marimon, Montserrat Milà-Garcia, Alba Vivaldi, J. (Jorge), 1952- Padró, Muntsa Boosting the creation of a treebank |
topic_facet |
Dependency treebank Treebank bootstrapping Less resourced languages |
description |
Comunicació presentada al 9th International Conference on Language Resources and Evaluation (LREC'14), celebrat del 26 al 31 de maig de 2014 a Reykjavík, Islàndia. We present the results of the experiment of bootstrapping a Treebank for Catalan by using a Dependency Parser trained with Spanish sentences. In order to save time and cost, our approach was to profit from the typological similarities between Catalan and Spanish to create a first Catalan data set quickly by (i) automatically annotating with a delexicalized Spanish parser, (ii) manually correcting the parses, and (iii) using the Catalan corrected sentences to train a Catalan parser. The results showed that the number of parsed sentences required to train a Catalan parser is about 1000, which were achieved in 4 months with 2 annotators. This work was partially supported by the SKATER project (Ministerio de Economía y Competitividad, TIN2012-38584-C06-05). |
format |
Conference Object |
author |
Arias Badia, Blanca Bel Rafecas, Núria Fomicheva, Marina Larrea Mendizabal, Imanol Lorente, Mercè Marimon, Montserrat Milà-Garcia, Alba Vivaldi, J. (Jorge), 1952- Padró, Muntsa |
author_facet |
Arias Badia, Blanca Bel Rafecas, Núria Fomicheva, Marina Larrea Mendizabal, Imanol Lorente, Mercè Marimon, Montserrat Milà-Garcia, Alba Vivaldi, J. (Jorge), 1952- Padró, Muntsa |
author_sort |
Arias Badia, Blanca |
title |
Boosting the creation of a treebank |
title_short |
Boosting the creation of a treebank |
title_full |
Boosting the creation of a treebank |
title_fullStr |
Boosting the creation of a treebank |
title_full_unstemmed |
Boosting the creation of a treebank |
title_sort |
boosting the creation of a treebank |
publisher |
ELRA (European Language Resources Association) |
url |
http://hdl.handle.net/10230/46232 |
geographic |
Reykjavík |
geographic_facet |
Reykjavík |
genre |
Reykjavík Reykjavík |
genre_facet |
Reykjavík Reykjavík |
op_relation |
Calzolari N, Choukri K, Declerck T, Loftsson H, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S, editors. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC'14); 2014 May 26-31; Reykjavik, Iceland. Paris: European Language Resources Association (ELRA); 2014. p. 775-81 info:eu-repo/grantAgreement/ES/3PN/TIN2012-38584-C06-05 Arias B, Bel N, Lorente M, Marimón M, Milà A, Vivaldi J, Padró M, Fomicheva M, Larrea I. Boosting the creation of a treebank. In: Calzolari N, Choukri K, Declerck T, Loftsson H, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S, editors. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC'14); 2014 May 26-31; Reykjavik, Iceland. Paris: European Language Resources Association (ELRA); 2014. p. 775-81. http://hdl.handle.net/10230/46232 |
op_rights |
Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License (https://creativecommons.org/licenses/by-nc-sa/3.0/) https://creativecommons.org/licenses/by-nc-sa/3.0/ info:eu-repo/semantics/openAccess |
op_rightsnorm |
CC-BY-NC-SA |
_version_ |
1766178756771184640 |