Automated Classification and Categorization of Mathematical Knowledge

Abstract. There is a common Mathematics Subject Classification (MSC) System used for categorizing mathematical papers and knowledge. We present results of machine learning of the MSC on full texts of papers in the mathematical digital libraries DML-CZ and NUMDAM. The F1-measure achieved on classific...

Full description

Bibliographic Details
Main Authors: Radim Řehůřek, Petr Sojka
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Subjects:
DML
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.221.1116
http://www.fi.muni.cz/usr/sojka/papers/mkm2008-rehurek-sojka.pdf
Description
Summary:Abstract. There is a common Mathematics Subject Classification (MSC) System used for categorizing mathematical papers and knowledge. We present results of machine learning of the MSC on full texts of papers in the mathematical digital libraries DML-CZ and NUMDAM. The F1-measure achieved on classification task of top-level MSC categories exceeds 89%. We describe and evaluate our methods for measuring the similarity of papers in the digital library based on paper full texts. 1