Statistical Anomaly Detection












Tim Gushue

Developing Data Products

December 2015

Anomaly Detection

"Anomaly detection refers to the problem of finding patterns in data that do not conform to expected behavior. Anomaly detection finds extensive use in a wide variety of applications such as

Normal Distribution

"The statistical approaches to anomaly detection work well on data with a Guassian distribution. They are primarily focused on discovering how far data points fall from the mean of the data, and uses an outlier score of how many standard deviations away from the mean that a point is in a data set."[2]

image

Z-Scores

"A dataset can be standardized by taking the z-score of each point. A z-score is a measure of how many standard deviations a data point is away from the mean of the data. Any data-point that has a z-score higher than 3 is an outlier, and likely to be an anomaly. As the z-score increases above 3, points become more obviously anomalous. A z-score is calculated using the following equation."[2]

\[ z = \frac{x-\mu}{\sigma} \]

To generate random draws from a normal distribution we use the rnorm function. We can generate z-scores for these using the scale function.

events <- rnorm(20, 100, 5);
as.vector(t((scale(events, center = TRUE, scale = TRUE))))
##  [1] -1.929333870  1.791156471  0.630921828 -0.008134061 -0.448887244
##  [6] -0.836380892  0.999290536  0.059690836  1.020529159 -0.295804606
## [11]  0.012754135 -0.376080501 -0.271797097 -0.472452799 -1.234739150
## [16] -0.506848065  0.756693038  2.131491012  0.113798637 -1.135867368

Anomaly Detection Shiny App

The Anomaly Detection App was designed to help visualize the effect different z-score thresholds can have on determining statistical outliers.

By adjusting the slider the user gets real time visualizations about which data would be considered outliers. Any data with a z-score greater than the slider's value gets highlighted on the chart.

References

  1. Chandola, V.; Banerjee, A.; Kumar, V. (2009). "Anomaly detection: A survey" (PDF). ACM Computing Surveys 41 (3): 1–58. doi:10.1145/1541880.1541882.

  2. Whitney, T. "Anomaly Detection." Web. 27 Dec. 2015.

/

#