Monash University and

Anomalies are data points that are few and different. As a result of these properties, we show that, anomalies are susceptible to a mechanism called isolation. This paper proposes a method called Isolation Forest (iForest) which detects anomalies purely based on the concept of isolation without empl...

Full description

Bibliographic Details
Main Authors: Fei Tony Liu, Kai Ming Ting, Zhi-hua Zhou
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.673.5779
http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/tkdd11.pdf
id ftciteseerx:oai:CiteSeerX.psu:10.1.1.673.5779
record_format openpolar
spelling ftciteseerx:oai:CiteSeerX.psu:10.1.1.673.5779 2023-05-15T17:53:48+02:00 Monash University and Fei Tony Liu Kai Ming Ting Zhi-hua Zhou The Pennsylvania State University CiteSeerX Archives application/pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.673.5779 http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/tkdd11.pdf en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.673.5779 http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/tkdd11.pdf Metadata may be used without restrictions as long as the oai identifier remains attached to it. http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/tkdd11.pdf Categories and Subject Descriptors H.2.8 [Database Management Database Applications— Data Mining I.2.6 [Artificial Intelligence Learning General Terms Algorithm Design Experimentation Additional Key Words and Phrases Anomaly detection outlier detection ensemble methods binary tree text ftciteseerx 2016-01-08T17:29:22Z Anomalies are data points that are few and different. As a result of these properties, we show that, anomalies are susceptible to a mechanism called isolation. This paper proposes a method called Isolation Forest (iForest) which detects anomalies purely based on the concept of isolation without employing any distance or density measure—fundamentally different from all existing methods. As a result, iForest is able to exploit subsampling (i) to achieve a low linear time-complexity and a small memory-requirement, and (ii) to deal with the effects of swamping and masking effectively. Our empirical evaluation shows that iForest outperforms ORCA, one-class SVM, LOF and Random Forests in terms of AUC, processing time, and it is robust against masking and swamping effects. iForest also works well in high dimensional problems containing a large number of irrelevant attributes, and when anomalies are not available in training sample. Text Orca Unknown
institution Open Polar
collection Unknown
op_collection_id ftciteseerx
language English
topic Categories and Subject Descriptors
H.2.8 [Database Management
Database Applications— Data Mining
I.2.6 [Artificial Intelligence
Learning General Terms
Algorithm
Design
Experimentation Additional Key Words and Phrases
Anomaly detection
outlier detection
ensemble methods
binary tree
spellingShingle Categories and Subject Descriptors
H.2.8 [Database Management
Database Applications— Data Mining
I.2.6 [Artificial Intelligence
Learning General Terms
Algorithm
Design
Experimentation Additional Key Words and Phrases
Anomaly detection
outlier detection
ensemble methods
binary tree
Fei Tony Liu
Kai Ming Ting
Zhi-hua Zhou
Monash University and
topic_facet Categories and Subject Descriptors
H.2.8 [Database Management
Database Applications— Data Mining
I.2.6 [Artificial Intelligence
Learning General Terms
Algorithm
Design
Experimentation Additional Key Words and Phrases
Anomaly detection
outlier detection
ensemble methods
binary tree
description Anomalies are data points that are few and different. As a result of these properties, we show that, anomalies are susceptible to a mechanism called isolation. This paper proposes a method called Isolation Forest (iForest) which detects anomalies purely based on the concept of isolation without employing any distance or density measure—fundamentally different from all existing methods. As a result, iForest is able to exploit subsampling (i) to achieve a low linear time-complexity and a small memory-requirement, and (ii) to deal with the effects of swamping and masking effectively. Our empirical evaluation shows that iForest outperforms ORCA, one-class SVM, LOF and Random Forests in terms of AUC, processing time, and it is robust against masking and swamping effects. iForest also works well in high dimensional problems containing a large number of irrelevant attributes, and when anomalies are not available in training sample.
author2 The Pennsylvania State University CiteSeerX Archives
format Text
author Fei Tony Liu
Kai Ming Ting
Zhi-hua Zhou
author_facet Fei Tony Liu
Kai Ming Ting
Zhi-hua Zhou
author_sort Fei Tony Liu
title Monash University and
title_short Monash University and
title_full Monash University and
title_fullStr Monash University and
title_full_unstemmed Monash University and
title_sort monash university and
url http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.673.5779
http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/tkdd11.pdf
genre Orca
genre_facet Orca
op_source http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/tkdd11.pdf
op_relation http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.673.5779
http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/tkdd11.pdf
op_rights Metadata may be used without restrictions as long as the oai identifier remains attached to it.
_version_ 1766161508594614272