Usage of XSL Stylesheets for the annotation of the Sámi language corpora

This paper describes an annotation system for Sámi language corpora, which consists of structured, running texts. The annotation of the texts is fully automatic, starting from the original documents in different formats. The texts are first extracted from the original documents preserving the origin...

Full description

Bibliographic Details
Main Author: Saara Huhmarniemi
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.4828
http://acl.ldc.upenn.edu/w/w07/w07-1507.pdf
Description
Summary:This paper describes an annotation system for Sámi language corpora, which consists of structured, running texts. The annotation of the texts is fully automatic, starting from the original documents in different formats. The texts are first extracted from the original documents preserving the original structural markup. The markup is enhanced by a document-specific XSLT script which contains document-specific formatting instructions. The overall maintenance is achieved by system-wide XSLT scripts. 1