Recommender system of physical and mathematical documents classification

© 2018 CEUR-WS. All rights reserved. The problems of increasing the value of scientific classifiers for the systematization of scientific information in the digital age are discussed, for example, the classification of documents (assignment of indices-classifiers) is a traditional way of systematiza...

Full description

Bibliographic Details
Format: Conference Object
Language:unknown
Published: 2018
Subjects:
UDC
DML
Online Access:https://openrepository.ru/article?id=189237
Description
Summary:© 2018 CEUR-WS. All rights reserved. The problems of increasing the value of scientific classifiers for the systematization of scientific information in the digital age are discussed, for example, the classification of documents (assignment of indices-classifiers) is a traditional way of systematization of knowledge and information search. In this paper we propose a recommendation system for automated selection of Universal decimal classification (UDC) indices for physical and mathematical documents. This system implements one of the services of the digital mathematical library Lobachevskii-DML. The proposed algorithm is based on the use of terms extracted from the title, the list of keywords and annotations given in the analyzed documents. Extraction of terms from the documents of the collection is carried out with the help of software tools developed by us, taking into account the stylistic features of the documents and the positions in the text of the required terms. The data obtained were included in a dictionary that has an inverted index structure. The generated dictionary contains both classification features and sets of key terms, which are used to systematize and classify the material. Most of these terms were obtained by automated processing of the collection of archives of physical and mathematical publications of the All-Russian mathematical portal Math-Net.Ru. The proposed variant of the semantic markup of a table of indices of the Universal decimal classification. The model of classification of scientific documents is described.