Content
Introduction of Anomaly Detection
Isolation Forest - A Basic
The Challenges of Online Learning
Streaming Isolation Forest
Data set and Results
Related Resources
Visiting University of Waikato
Introduction of Anomaly Detection
Isolation Forest - A Basic
The Challenges of Online Learning
Streaming Isolation Forest
Data set and Results
Related Resources
Visiting University of Waikato
\(N_1\) and \(N_2\) are normal regions. \(o_1\) and \(o_2\) are points that are sufficiently far away from these regions. \(O_3\) are clustered anomalies.
Formal definition: “Anomaly detection refers to the problem of finding patterns in data that do not conform to expected behavior.” (Chandola, Banerjee, and Kumar 2009)
Anomaly detection is therefore the action to label anomalies and normal points correctly.
Isolation Forest (iForest) is a collection of Isolation Trees.
An Isolation Tree (iTree) is a binary tree constructed based on sub-samples.
Definition:
Anomalies are few and different
Expensive computations are not feasible
Must keep up with (or faster than) the data stream
Leverage parallel computing for efficiency
| Algorithm | Sampling Method | Update Granularity | Scoring Function |
|---|---|---|---|
| SiForest | Reservoir Sampling | Branch | Same as IF |
| iForestASD | Sliding Window | Forest, ** | Same as IF |
| olFOR | Sliding Window | Leaf | Same as IF |
| RRCF | Sliding Window | Node | Displacement |
| HST | Sliding Window | Forest, ++ | same as IF |
** when ratio of anomalies becomes high
++ update every window
SiForest is implemented on CapyMOA.org
All experiments ran on Intel® Core™ i5-10500 3.10GHz CPU with 32 GB RAM running Ubuntu 22.04.4 LTS.
Applied Streaming Anomaly Detection to different domains
Semi-supervised Anomaly Detection (when some labelled data is available)
Active Learning (Learner has a budget to label particular data points)