Isolation-Based Anomaly Detection
Anomalies are data points that are few and different. As a result of these properties, we show that, anomalies are susceptible to a mechanism called isolation . This article proposes a method called Isolation Forest ( i Forest), which detects anomalies purely based on the concept of isolation withou...
Published in: | ACM Transactions on Knowledge Discovery from Data |
---|---|
Main Authors: | , , |
Other Authors: | , , |
Format: | Article in Journal/Newspaper |
Language: | English |
Published: |
Association for Computing Machinery (ACM)
2012
|
Subjects: | |
Online Access: | http://dx.doi.org/10.1145/2133360.2133363 https://dl.acm.org/doi/pdf/10.1145/2133360.2133363 |
Summary: | Anomalies are data points that are few and different. As a result of these properties, we show that, anomalies are susceptible to a mechanism called isolation . This article proposes a method called Isolation Forest ( i Forest), which detects anomalies purely based on the concept of isolation without employing any distance or density measure---fundamentally different from all existing methods. As a result, i Forest is able to exploit subsampling (i) to achieve a low linear time-complexity and a small memory-requirement and (ii) to deal with the effects of swamping and masking effectively. Our empirical evaluation shows that i Forest outperforms ORCA, one-class SVM, LOF and Random Forests in terms of AUC, processing time, and it is robust against masking and swamping effects. i Forest also works well in high dimensional problems containing a large number of irrelevant attributes, and when anomalies are not available in training sample. |
---|