Automated Classification and Categorization of Mathematical Knowledge

Abstract. There is a common Mathematics Subject Classification (MSC) System used for categorizing mathematical papers and knowledge. We present results of machine learning of the MSC on full texts of papers in the mathematical digital libraries DML-CZ and NUMDAM. The F1-measure achieved on classific...

Full description

Bibliographic Details
Main Authors: Radim Řehůřek, Petr Sojka
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Subjects:
DML
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.221.1116
http://www.fi.muni.cz/usr/sojka/papers/mkm2008-rehurek-sojka.pdf
id ftciteseerx:oai:CiteSeerX.psu:10.1.1.221.1116
record_format openpolar
spelling ftciteseerx:oai:CiteSeerX.psu:10.1.1.221.1116 2023-05-15T16:01:46+02:00 Automated Classification and Categorization of Mathematical Knowledge Radim Řehůřek Petr Sojka The Pennsylvania State University CiteSeerX Archives application/pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.221.1116 http://www.fi.muni.cz/usr/sojka/papers/mkm2008-rehurek-sojka.pdf en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.221.1116 http://www.fi.muni.cz/usr/sojka/papers/mkm2008-rehurek-sojka.pdf Metadata may be used without restrictions as long as the oai identifier remains attached to it. http://www.fi.muni.cz/usr/sojka/papers/mkm2008-rehurek-sojka.pdf capacity to select edit single out structure highlight group pair merge harmonize synthesize focus organize condense reduce boil down choose categorize catalog classify list abstract scan look into idealize isolate discriminate distinguish screen pigeonhole pick over sort integrate blend inspect text ftciteseerx 2016-01-07T18:18:52Z Abstract. There is a common Mathematics Subject Classification (MSC) System used for categorizing mathematical papers and knowledge. We present results of machine learning of the MSC on full texts of papers in the mathematical digital libraries DML-CZ and NUMDAM. The F1-measure achieved on classification task of top-level MSC categories exceeds 89%. We describe and evaluate our methods for measuring the similarity of papers in the digital library based on paper full texts. 1 Text DML Unknown
institution Open Polar
collection Unknown
op_collection_id ftciteseerx
language English
topic capacity to select
edit
single out
structure
highlight
group
pair
merge
harmonize
synthesize
focus
organize
condense
reduce
boil down
choose
categorize
catalog
classify
list
abstract
scan
look into
idealize
isolate
discriminate
distinguish
screen
pigeonhole
pick over
sort
integrate
blend
inspect
spellingShingle capacity to select
edit
single out
structure
highlight
group
pair
merge
harmonize
synthesize
focus
organize
condense
reduce
boil down
choose
categorize
catalog
classify
list
abstract
scan
look into
idealize
isolate
discriminate
distinguish
screen
pigeonhole
pick over
sort
integrate
blend
inspect
Radim Řehůřek
Petr Sojka
Automated Classification and Categorization of Mathematical Knowledge
topic_facet capacity to select
edit
single out
structure
highlight
group
pair
merge
harmonize
synthesize
focus
organize
condense
reduce
boil down
choose
categorize
catalog
classify
list
abstract
scan
look into
idealize
isolate
discriminate
distinguish
screen
pigeonhole
pick over
sort
integrate
blend
inspect
description Abstract. There is a common Mathematics Subject Classification (MSC) System used for categorizing mathematical papers and knowledge. We present results of machine learning of the MSC on full texts of papers in the mathematical digital libraries DML-CZ and NUMDAM. The F1-measure achieved on classification task of top-level MSC categories exceeds 89%. We describe and evaluate our methods for measuring the similarity of papers in the digital library based on paper full texts. 1
author2 The Pennsylvania State University CiteSeerX Archives
format Text
author Radim Řehůřek
Petr Sojka
author_facet Radim Řehůřek
Petr Sojka
author_sort Radim Řehůřek
title Automated Classification and Categorization of Mathematical Knowledge
title_short Automated Classification and Categorization of Mathematical Knowledge
title_full Automated Classification and Categorization of Mathematical Knowledge
title_fullStr Automated Classification and Categorization of Mathematical Knowledge
title_full_unstemmed Automated Classification and Categorization of Mathematical Knowledge
title_sort automated classification and categorization of mathematical knowledge
url http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.221.1116
http://www.fi.muni.cz/usr/sojka/papers/mkm2008-rehurek-sojka.pdf
genre DML
genre_facet DML
op_source http://www.fi.muni.cz/usr/sojka/papers/mkm2008-rehurek-sojka.pdf
op_relation http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.221.1116
http://www.fi.muni.cz/usr/sojka/papers/mkm2008-rehurek-sojka.pdf
op_rights Metadata may be used without restrictions as long as the oai identifier remains attached to it.
_version_ 1766397498821181440