Transducers for annotating weather information in meteorological texts in Serbian

We present a process of extracting information on meteorological phenomena from texts in Serbian. We used finite state automata and transducers for both text processing and information extraction, through software specialized for linguistic text processing. Information extraction was done by an- not...

Full description

Bibliographic Details
Main Authors: Pajić, Vesna, Vujičić-Stanković, Staša, Pajić, Miloš
Format: Article in Journal/Newspaper
Language:unknown
Published: Zajednica biblioteka univerziteta u Srbiji, Beograd 2012
Subjects:
Online Access:http://aspace.agrif.bg.ac.rs/handle/123456789/3059
https://hdl.handle.net/21.15107/rcub_agrospace_3059
Description
Summary:We present a process of extracting information on meteorological phenomena from texts in Serbian. We used finite state automata and transducers for both text processing and information extraction, through software specialized for linguistic text processing. Information extraction was done by an- notating text segments. The extraction rules were described with transducers (finite state transducers and recursive transition networks). Some details of used transducers are presented in this paper, aiming to demonstrate the application of different electronic resources for Serbian, especially the electronic morphological dictionary. Transducers are very efficient tools for language processing. In the case of processing Serbian, it is very important to create different resources and corpora which could allow linguistic research. Therefore, we plan to form a collection of transducers and make it publicly available for different kinds of research in the computational linguistics domain. U radu je prikazan jedan proces izdvajanja informacija o meteorološkim pojavama iz tekstova na srpskom jeziku. Obrada teksta, kao i samo izdvajanje informacija, vršeno je uz pomoć konačnih automata i transduktora, kreiranih i primenjenih pomoću programa specijalizovanih za lingvističku obradu teksta. Samo izdvajanje informacija vršeno je obeležavanjem segmenata teksta. Sva pravila korišćena za obeležavanje predstavljena su transduktorima (konačnim transduktorima i rekurzivnim mrežama prelaza). U radu su detaljno prikazani neki od korišćenih transduktora, sa ciljem da se demonstrira upotreba različitih elektronskih resursa srpskog jezika, na prvom mestu elektronskih morfoloških rečnika. Sami transduktori su veoma efikasno sredstvo za obradu jezika. U slučaju obrade srpskog jezika, kreiranje različitih resursa i korpusa koji bi omogućili lingvistička istraživanja veo- ma je važno. Stoga je planirano da se u budućnosti formira kolekcija transduktora koja bi bila javno dostupna i raspoloživa za različite vrste istraživanja iz oblasti ...