Sigrid Keydana, Trivadis
2017/16/09
An outlier is… an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism [3]
—> we need a probabilistic approach
Enter: generative stochastic models
Maximize:
\( P(X) = \int P(X|z;\theta) P(z) dz \)
How do we get a mapping from easy-to-sample-from \( z \)'s to the empirical output (\( X \))?
How do we make sure our \( z \)'s are in the appropriate range to yield the \( X \)'s?
Sample from \( P(Z) = \mathcal N(z|0,I) \)
Let the network learn the decoder: \( P(X|z;\theta) = \mathcal N(X|f(z,\theta), \sigma^2 I) \)
Let the network learn the encoder: \( Q(z|X) \)
Maximize the variational lower bound
\[ E_{z \sim Q}[log P(X|z)] - \mathcal D[Q(z|X)||P(z)] \]
where the terms are
1) the reconstruction probability
2) the Kullback-Leibler divergence between the approximate posterior and the prior for \( z \)
Generate stuff (e.g., images)
VAE models the distribution, not the values
anomalies are seen as coming from a different process / distribution, so…
can diagnose as anomalies those cases that have high reconstruction error / low reconstruction probability [5]
MNIST, what else ;-)
fraud detection
network intrusion detection
Keras, used from R, via the bindings provided by Rstudio:
DL4J:
MNIST handwritten digits database
# input layer
x <- layer_input(batch_shape = c(batch_size, original_dim))
# hidden intermediate, lower-res
h <- layer_dense(x, intermediate_dim, activation = "relu")
# latent var 1, 2-dim (mainly for plotting!): mean
z_mean <- layer_dense(h, latent_dim)
# latent var 2, 2-dim: variance
z_log_var <- layer_dense(h, latent_dim)
sampling <- function(arg){
z_mean <- arg[,0:1]
z_log_var <- arg[,2:3]
epsilon <- K$random_normal(
shape = c(batch_size, latent_dim),
mean=0.,
stddev=epsilon_std
)
z_mean + K$exp(z_log_var/2)*epsilon
}
# latent vars are sampled: nondeterministic!
z <- layer_concatenate(list(z_mean, z_log_var)) %>%
layer_lambda(sampling)
# the latent layer from above
z <- layer_concatenate(list(z_mean, z_log_var)) %>%
layer_lambda(sampling)
# hidden intermediate, higher-res
decoder_h <- layer_dense(units = intermediate_dim, activation = "relu")
# decoder for the mean, high-res again
decoder_mean <- layer_dense(units = original_dim, activation = "sigmoid")
h_decoded <- decoder_h(z)
x_decoded_mean <- decoder_mean(h_decoded)
# the complete model, from input to decoded output
vae <- keras_model(x, x_decoded_mean)
# cross-entropy: the reconstruction part of the loss function
xent_loss <- function(target, reconstruction) {
as.double(original_dim) * loss_binary_crossentropy(target, reconstruction)
}
# Kullback-Leibler divergence: the regularization part of the loss function
kl_loss <- function(target, reconstruction) {
-0.5*K$mean(1 + z_log_var - K$square(z_mean) - K$exp(z_log_var), axis = -1L)
}
vae_loss <- function(x, x_decoded_mean){
xent_loss <- (original_dim/1.0)*loss_binary_crossentropy(x, x_decoded_mean)
kl_loss <- -0.5*K$mean(1 + z_log_var - K$square(z_mean) - K$exp(z_log_var), axis = -1L)
xent_loss + kl_loss
}
# compile the model with (hopefully) adequate optimizer and learning rate
vae %>% compile(optimizer = optimizer_rmsprop(lr = learning_rate), loss = vae_loss, metrics = c(xent_loss, kl_loss))
using reconstruction probability
How about real datasets with different types of variables of different scale?
How to best preprocess the data?
Reconstruction error (MSE, in this case):
Problem: small sample sizes (non-fraud: 6013, fraud: 180)
h <- layer_dense(x, intermediate_dim)
if(use_batch_normalization) h <- h %>% layer_batch_normalization()
h <- h %>% layer_activation("relu") %>% layer_activity_regularization(l1=l1, l2=l2)
Reconstruction error (MSE):
| type | number of cases | MSE |
| normal (no attack) | 56,000 | 40.4 |
| analysis | 2,000 | 81.1 |
| DoS | 12,264 | 55.3 |
| exploits | 33,393 | 50.4 |
| fuzzers | 18,184 | 37.8 |
| generic | 40,000 | 48.8 |
| reconnaissance | 10,491 | 20.5 |
| shellcode | 1,133 | 19.3 |
| worms | 130 | 38.6 |
Thanks for your attention!
[1] Asimov Institute, The Neural Network Zoo.
[2] Keras blog, Building autoencoders in Keras.
[3] Hawkins, D. (1980), Identification of outliers. Chapman and Hall, London.
[4] Doersch, C.,Tutorial on Variational Autoencoders.
[5] An, J. & Cho, S.,Variational Autoencoder based Anomaly Detection using Reconstruction Probability
[6] Torgo, L. (2017), Data Mining with R, 2nd ed.
[7] Moustafa, Nour, and Jill Slay. “UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)."Military Communications and Information Systems Conference (MilCIS), 2015. IEEE, 2015.
[8] Moustafa, Nour, and Jill Slay. "The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set.” Information Security Journal: A Global Perspective (2016): 1-14.