We are using the complete dataset for this experiment; we are considering all the attributes and all the instances. There are two graphics, each one corresponds to a different method (the paper proposes two approaches). We plotted a blue line to determine a possible threshold for inliers/outliers, this threshold can be moved depending on the desired effect: false positives (FP) or false negatives (FN). Then everything to the right of the blue line is inlier and everything to the right is a potential outlier.
hist(bpMADs[,2],xlab="Scores",
main="ensemble of detectors with variability votes (EDVV)")
abline(v=2,col="blue",lwd=3)
hist(bpCORs[,2],xlab="Scores",
main="ensemble of detectors with correlated votes (EDCV)")
abline(v=4,col="blue",lwd=3)
Note that this are only preliminary results based on the provided data until today (30/10/2014). Also I will modify the binary data (0,1) to a possible more convenient format using IDF and see what happens.
A very important consideration is that to see really if it is working on this type of data I will need the labels of the data, with this labels I will then be able to provide results in the form of AUC and ROC curves.
plot(density(bpMADs[,2]),xlab="Scores",
main="Density (EDVV)")
plot(density(bpCORs[,2]),xlab="Scores",
main="Density (EDCV)")