Mining Outliers with Faster Cutoff Update and Space Utilization

Abstract. It is desirable to find unusual data objects by Ramaswamy et al’s distance-based outlier definition because only a metric distance function between two objects is required. It does not need any neighborhood distance threshold required by many existing algorithms based on the definition of...

Full description

Bibliographic Details
Main Authors: Chi-cheong Szeto, Edward Hung
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.5360
http://www.comp.polyu.edu.hk/~csehung/paper/rcs.pdf
id ftciteseerx:oai:CiteSeerX.psu:10.1.1.150.5360
record_format openpolar
spelling ftciteseerx:oai:CiteSeerX.psu:10.1.1.150.5360 2023-05-15T17:53:30+02:00 Mining Outliers with Faster Cutoff Update and Space Utilization Chi-cheong Szeto Edward Hung The Pennsylvania State University CiteSeerX Archives application/pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.5360 http://www.comp.polyu.edu.hk/~csehung/paper/rcs.pdf en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.5360 http://www.comp.polyu.edu.hk/~csehung/paper/rcs.pdf Metadata may be used without restrictions as long as the oai identifier remains attached to it. http://www.comp.polyu.edu.hk/~csehung/paper/rcs.pdf text ftciteseerx 2016-01-07T15:19:49Z Abstract. It is desirable to find unusual data objects by Ramaswamy et al’s distance-based outlier definition because only a metric distance function between two objects is required. It does not need any neighborhood distance threshold required by many existing algorithms based on the definition of Knorr and Ng. Bay and Schwabacher proposed an efficient algorithm ORCA, which can give near linear time performance, for this task. To further reduce the running time, we propose in this paper two algorithms RC and RS using the following two techniques respectively: (i) faster cutoff update, and (ii) space utilization after pruning. We tested RC, RS and RCS (a hybrid approach combining both RC and RS) on several large and high-dimensional real data sets with millions of objects. The experiments show that the speed of RCS is as fast as 1.4 to 2.3 times that of ORCA, and the improvement of RCS is relatively insensitive to the increase in the data size. 1 Text Orca Unknown
institution Open Polar
collection Unknown
op_collection_id ftciteseerx
language English
description Abstract. It is desirable to find unusual data objects by Ramaswamy et al’s distance-based outlier definition because only a metric distance function between two objects is required. It does not need any neighborhood distance threshold required by many existing algorithms based on the definition of Knorr and Ng. Bay and Schwabacher proposed an efficient algorithm ORCA, which can give near linear time performance, for this task. To further reduce the running time, we propose in this paper two algorithms RC and RS using the following two techniques respectively: (i) faster cutoff update, and (ii) space utilization after pruning. We tested RC, RS and RCS (a hybrid approach combining both RC and RS) on several large and high-dimensional real data sets with millions of objects. The experiments show that the speed of RCS is as fast as 1.4 to 2.3 times that of ORCA, and the improvement of RCS is relatively insensitive to the increase in the data size. 1
author2 The Pennsylvania State University CiteSeerX Archives
format Text
author Chi-cheong Szeto
Edward Hung
spellingShingle Chi-cheong Szeto
Edward Hung
Mining Outliers with Faster Cutoff Update and Space Utilization
author_facet Chi-cheong Szeto
Edward Hung
author_sort Chi-cheong Szeto
title Mining Outliers with Faster Cutoff Update and Space Utilization
title_short Mining Outliers with Faster Cutoff Update and Space Utilization
title_full Mining Outliers with Faster Cutoff Update and Space Utilization
title_fullStr Mining Outliers with Faster Cutoff Update and Space Utilization
title_full_unstemmed Mining Outliers with Faster Cutoff Update and Space Utilization
title_sort mining outliers with faster cutoff update and space utilization
url http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.5360
http://www.comp.polyu.edu.hk/~csehung/paper/rcs.pdf
genre Orca
genre_facet Orca
op_source http://www.comp.polyu.edu.hk/~csehung/paper/rcs.pdf
op_relation http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.5360
http://www.comp.polyu.edu.hk/~csehung/paper/rcs.pdf
op_rights Metadata may be used without restrictions as long as the oai identifier remains attached to it.
_version_ 1766161201828462592