Boosting the creation of a treebank

Comunicació presentada al 9th International Conference on Language Resources and Evaluation (LREC'14), celebrat del 26 al 31 de maig de 2014 a Reykjavík, Islàndia. We present the results of the experiment of bootstrapping a Treebank for Catalan by using a Dependency Parser trained with Spanish...

Full description

Bibliographic Details
Main Authors: Arias Badia, Blanca, Bel Rafecas, Núria, Fomicheva, Marina, Larrea Mendizabal, Imanol, Lorente, Mercè, Marimon, Montserrat, Milà-Garcia, Alba, Vivaldi, J. (Jorge), 1952-, Padró, Muntsa
Format: Conference Object
Language:English
Published: ELRA (European Language Resources Association)
Subjects:
Online Access:http://hdl.handle.net/10230/46232
Description
Summary:Comunicació presentada al 9th International Conference on Language Resources and Evaluation (LREC'14), celebrat del 26 al 31 de maig de 2014 a Reykjavík, Islàndia. We present the results of the experiment of bootstrapping a Treebank for Catalan by using a Dependency Parser trained with Spanish sentences. In order to save time and cost, our approach was to profit from the typological similarities between Catalan and Spanish to create a first Catalan data set quickly by (i) automatically annotating with a delexicalized Spanish parser, (ii) manually correcting the parses, and (iii) using the Catalan corrected sentences to train a Catalan parser. The results showed that the number of parsed sentences required to train a Catalan parser is about 1000, which were achieved in 4 months with 2 annotators. This work was partially supported by the SKATER project (Ministerio de Economía y Competitividad, TIN2012-38584-C06-05).