Linguistics vs. digital editions: The Tromsø Old Russian and OCS Treebank

Source at http://e-scripta.ilit.bas.bg/archives/year-2015/issue-14-15 . Journal home page at http://e-scripta.ilit.bas.bg/ . The Tromsø Old Russian and OCS Treebank (TOROT, nestor.uit.no)1 is, along with its parent treebank, the PROIEL corpus (foni.uio.no), the only existing treebank of Old Church S...

Full description

Bibliographic Details
Main Authors: Eckhoff, Hanne Martine, Berdicevskis, Aleksandrs
Format: Article in Journal/Newspaper
Language:English
Published: Institute for Literature, Bulgarian Academy of Sciences 2015
Subjects:
Online Access:https://hdl.handle.net/10037/22366
Description
Summary:Source at http://e-scripta.ilit.bas.bg/archives/year-2015/issue-14-15 . Journal home page at http://e-scripta.ilit.bas.bg/ . The Tromsø Old Russian and OCS Treebank (TOROT, nestor.uit.no)1 is, along with its parent treebank, the PROIEL corpus (foni.uio.no), the only existing treebank of Old Church Slavonic (OCS), Old East Slavic and Middle Russian texts. There are other tagged resources, such as the Old Russian subcorpus of the Russian National Corpus2 and the Manuskript corpus,3 but none of them, to our knowledge, currently provide syntactic annotation. The TOROT presently contains approximately 160,000 word tokens of fully annotated OCS (Codex Marianus4 and Codex Suprasliensis), 85,000 word tokens of fully annotated Kiev-era Old East Slavic, and 60,000 word tokens of fully annotated 15th–17th-century Middle Russian. In addition, it contains the Codex Zographensis with automatic and partially hand-corrected morphological annotation and lemmatisation (sections of the Gospels missing in the Codex Marianus also have full syntactic annotation), and the PROIEL version of the Greek Gospels, with which the Codex Marianus and the Codex Zographensis are both aligned at token level (automatically, then hand-corrected).