We are using the complete dataset for this experiment; we are considering all the attributes and all the instances. There are two graphics, each one corresponds to a different method (the paper proposes two approaches). We plotted a blue line to determine a possible threshold for inliers/outliers, this threshold can be moved depending on the desired effect: false positives (FP) or false negatives (FN). Then everything to the right of the blue line is inlier and everything to the right is a potential outlier.

hist(bpMADs[,2],xlab="Scores",
     main="ensemble of detectors with variability votes (EDVV)")
abline(v=2,col="blue",lwd=3)

plot of chunk unnamed-chunk-2

hist(bpCORs[,2],xlab="Scores",
     main="ensemble of detectors with correlated votes (EDCV)")
abline(v=4,col="blue",lwd=3)

plot of chunk unnamed-chunk-3

Note that this are only preliminary results based on the provided data until today (30/10/2014). Also I will modify the binary data (0,1) to a possible more convenient format using IDF and see what happens.

A very important consideration is that to see really if it is working on this type of data I will need the labels of the data, with this labels I will then be able to provide results in the form of AUC and ROC curves.

Additional graphics

Density plots

plot(density(bpMADs[,2]),xlab="Scores",
     main="Density (EDVV)")

plot of chunk unnamed-chunk-4

plot(density(bpCORs[,2]),xlab="Scores",
     main="Density (EDCV)")

plot of chunk unnamed-chunk-5