Isolation Forest

Most existing model-based approaches to anomaly de-tection construct a profile of normal instances, then iden-tify instances that do not conform to the normal profile as anomalies. This paper proposes a fundamentally different model-based method that explicitly isolates anomalies in-stead of profile...

Full description

Bibliographic Details
Main Authors: Fei Tony Liu, Kai Ming Ting, Zhi-hua Zhou
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.678.3903
http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/icdm08b.pdf
Description
Summary:Most existing model-based approaches to anomaly de-tection construct a profile of normal instances, then iden-tify instances that do not conform to the normal profile as anomalies. This paper proposes a fundamentally different model-based method that explicitly isolates anomalies in-stead of profiles normal points. To our best knowledge, the concept of isolation has not been explored in current liter-ature. The use of isolation enables the proposed method, iForest, to exploit sub-sampling to an extent that is not fea-sible in existing methods, creating an algorithm which has a linear time complexity with a low constant and a low mem-ory requirement. Our empirical evaluation shows that iFor-est performs favourably to ORCA, a near-linear time com-plexity distance-based method, LOF and Random Forests in terms of AUC and processing time, and especially in large data sets. iForest also works well in high dimensional prob-lems which have a large number of irrelevant attributes, and in situations where training set does not contain any anomalies. 1