Under consideration for publication in Knowledge and Information Systems Class Separation through Variance: A new Application of Outlier Detection

This paper introduces a new outlier detection approach and discusses and extends a new concept, class separation through variance. We show that even for balanced and concentric classes differing only in variance, accumulating information about the outlierness of points in multiple subspaces leads to...

Full description

Bibliographic Details
Main Authors: Andrew Foss, Osmar R. Zaïane
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.228.9168
http://www.cs.ualberta.ca/%7Ezaiane/postscript/KAIS-Foss-Zaiane2010/
id ftciteseerx:oai:CiteSeerX.psu:10.1.1.228.9168
record_format openpolar
spelling ftciteseerx:oai:CiteSeerX.psu:10.1.1.228.9168 2023-05-15T17:53:50+02:00 Under consideration for publication in Knowledge and Information Systems Class Separation through Variance: A new Application of Outlier Detection Andrew Foss Osmar R. Zaïane The Pennsylvania State University CiteSeerX Archives http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.228.9168 http://www.cs.ualberta.ca/%7Ezaiane/postscript/KAIS-Foss-Zaiane2010/ en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.228.9168 http://www.cs.ualberta.ca/%7Ezaiane/postscript/KAIS-Foss-Zaiane2010/ Metadata may be used without restrictions as long as the oai identifier remains attached to it. http://www.cs.ualberta.ca/%7Ezaiane/postscript/KAIS-Foss-Zaiane2010/ text ftciteseerx 2016-01-07T18:38:20Z This paper introduces a new outlier detection approach and discusses and extends a new concept, class separation through variance. We show that even for balanced and concentric classes differing only in variance, accumulating information about the outlierness of points in multiple subspaces leads to a ranking in which the classes naturally tend to separate. Exploiting this leads to a highly effective and efficient unsupervised class separation approach. Unlike typical outlier detection algorithms, this method can be applied beyond the ‘rare classes ’ case with great success. The new algorithm FASTOUT introduces a number of novel features. It employs sampling of subspaces points and is highly efficient. It handles arbitrarily sized subspaces and converges to an optimal subspace size through the use of an objective function. In addition, two approaches are presented for automatically deriving the class of the data points from the ranking. Experiments show that FASTOUT typically outperforms other state-of-the-art outlier detection methods on high dimensional data such as Feature Bagging, SOE1, LOF, ORCA and Robust Mahalanobis Distance, and competes even with the leading supervised classification methods for separating classes. Text Orca Unknown
institution Open Polar
collection Unknown
op_collection_id ftciteseerx
language English
description This paper introduces a new outlier detection approach and discusses and extends a new concept, class separation through variance. We show that even for balanced and concentric classes differing only in variance, accumulating information about the outlierness of points in multiple subspaces leads to a ranking in which the classes naturally tend to separate. Exploiting this leads to a highly effective and efficient unsupervised class separation approach. Unlike typical outlier detection algorithms, this method can be applied beyond the ‘rare classes ’ case with great success. The new algorithm FASTOUT introduces a number of novel features. It employs sampling of subspaces points and is highly efficient. It handles arbitrarily sized subspaces and converges to an optimal subspace size through the use of an objective function. In addition, two approaches are presented for automatically deriving the class of the data points from the ranking. Experiments show that FASTOUT typically outperforms other state-of-the-art outlier detection methods on high dimensional data such as Feature Bagging, SOE1, LOF, ORCA and Robust Mahalanobis Distance, and competes even with the leading supervised classification methods for separating classes.
author2 The Pennsylvania State University CiteSeerX Archives
format Text
author Andrew Foss
Osmar R. Zaïane
spellingShingle Andrew Foss
Osmar R. Zaïane
Under consideration for publication in Knowledge and Information Systems Class Separation through Variance: A new Application of Outlier Detection
author_facet Andrew Foss
Osmar R. Zaïane
author_sort Andrew Foss
title Under consideration for publication in Knowledge and Information Systems Class Separation through Variance: A new Application of Outlier Detection
title_short Under consideration for publication in Knowledge and Information Systems Class Separation through Variance: A new Application of Outlier Detection
title_full Under consideration for publication in Knowledge and Information Systems Class Separation through Variance: A new Application of Outlier Detection
title_fullStr Under consideration for publication in Knowledge and Information Systems Class Separation through Variance: A new Application of Outlier Detection
title_full_unstemmed Under consideration for publication in Knowledge and Information Systems Class Separation through Variance: A new Application of Outlier Detection
title_sort under consideration for publication in knowledge and information systems class separation through variance: a new application of outlier detection
url http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.228.9168
http://www.cs.ualberta.ca/%7Ezaiane/postscript/KAIS-Foss-Zaiane2010/
genre Orca
genre_facet Orca
op_source http://www.cs.ualberta.ca/%7Ezaiane/postscript/KAIS-Foss-Zaiane2010/
op_relation http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.228.9168
http://www.cs.ualberta.ca/%7Ezaiane/postscript/KAIS-Foss-Zaiane2010/
op_rights Metadata may be used without restrictions as long as the oai identifier remains attached to it.
_version_ 1766161535586009088