Goal
Reduce the number of False Positive generated by the NN classifier by using a client profile.
Approach for training the classifier
Features
The cliente profile was generated using the following features
CURRENT TIME WINDOW = 1H
Total number of Requests
Ratio of NX requests (answer=NX)
ratio of MX request (query_type=“MX”)
Ratio of Reverse requests (answer=in.arpa)
Ratio of Fail Request (answer=SERVFAIL)
Ratio of the total amount of domains that answered to more than one IP (given a query, count the number of distinct IP addresses, using the answer_ip field and sum them)
Ratio of the Average amount of domains that answered to more than one (calculate the average over all the queries that have distinct IP addresses. In other words instead of sum them, just calcuate the average)
Ratio of the total amount of IP that correspond to more than one Domain name (Given an answer_ip field, count the number distinct queries for this answer_ip and sum them)
Ratio of the Average amount of IP addresses that correspond to more than one Domain name. (Calculate the average over all the answer_ip that have distinct queries. In other word, instead of sum them, just calculate the average)
Note: in all cases ratio is calculated over the Total Number of requests (Feature 1)
Training approach
The training dataset contains only those client profiles labeled as Normal with at least one request detected by the NN. That is, those profiles that contains False Positives. THe resulting training dataset contains 225 client profiles.
The table below shows the GroundTruthLabel (Label) and the (NN) Label (i.e. those profile with at least 1 NN detection are labeled as DGA). For simplitiy a third label called (profile) is added for aggregating all those GroundTruhLabels not Normal.
Separating in training and testing Sets
We used a 2x5 Cross Validation approach. The ROC metric was used for finding the best model.
ctrl_fast <- trainControl(method="cv",
repeats=2,
number=5,
summaryFunction=twoClassSummary,
verboseIter=T,
classProbs=TRUE,
allowParallel = TRUE)
70% of the dataset was used for training and a 30% for testing.
Below we show the distribution of the profile labels in the resulting 2 datasets (training,testing)
data_train %>% group_by(profile) %>% summarise(total=n())
data_test %>% group_by(profile) %>% summarise(total=n())
A randomForest classifier is training using ROC metric for finding the best Model. Prior training the training dataset was Randomly Upsampled
ctrl_fast$sampling<-"up"
rfFit<- train(train_formula,
data = data_train,
# method = "svmRadialWeights", # Radial kernel
method = "rf",
tuneLength = 5,
#tuneGrid = svmGrid,
#preProcess=c("scale","center"),
metric="ROC",
#weights = model_weights,
trControl = ctrl_fast)
rfFit
The Confusion matrix for resulting model on the Training set is shown below.
rfFit$finalModel
Call:
randomForest(x = x, y = y, mtry = param$mtry)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 2
OOB estimate of error rate: 4.04%
Confusion matrix:
Malicious Normal class.error
Malicious 94 5 0.05050505
Normal 3 96 0.03030303
Most relevant features
THe list of the most relevant features used by the RandomForest Classifier
varImp(rfFit, scale = F)
rf variable importance
Overall
tot_requests 19.030
ratio_avg_samedomain 18.481
ratio_detected 12.318
ratio_tot_samedomain 8.645
ratio_nx 8.596
ratio_reverse 8.122
ratio_fail 6.902
ratio_tot_sameip 4.929
ratio_avg_sameip 4.738
ratio_mx 4.383
ROC Curve Analysis:
The resulting ROC curves showed that a threshold between 0.8 and 0.9 are the best options for detecting the most of the Normal Profiles (i.e False Negatives) while keeping a good rate of True Positive (i.e. those profiles labeled asMalicious)
#plot(roc(data_test$profile,predsrprofilerobsamp$Malicious))
ggplot(cbind(predsrprofilerobsamp,class=data_test$profile),
aes(m = Normal, d = factor(class, labels=c("Normal","Malicious"),levels = c("Normal", "Malicious")))) +
geom_roc(hjust = -0.4, vjust = 1.5,colour='orange') +
theme_bw()

Evaluation on Testing File
The Confusion matrix for the resulting model on the Testing set is shown below:
predsrprofilerobsamp=predict(rfFit,data_test,type='prob')
predsrfsamp=ifelse(predsrprofilerobsamp$Malicious >=0.9,'Malicious','Normal')
cm<-confusionMatrix(predsrfsamp,data_test$profile,positive="Malicious")
cm
Confusion Matrix and Statistics
Reference
Prediction Malicious Normal
Malicious 20 0
Normal 5 42
Accuracy : 0.9254
95% CI : (0.8344, 0.9753)
No Information Rate : 0.6269
P-Value [Acc > NIR] : 2.134e-08
Kappa : 0.8337
Mcnemar's Test P-Value : 0.07364
Sensitivity : 0.8000
Specificity : 1.0000
Pos Pred Value : 1.0000
Neg Pred Value : 0.8936
Prevalence : 0.3731
Detection Rate : 0.2985
Detection Prevalence : 0.2985
Balanced Accuracy : 0.9000
'Positive' Class : Malicious
With a specificity of 1 all the Normal profiles were correctly detected by the classifier. While, the Sensitiviy value was 0.8, which means that some of the the TP profile were not detected.
Distribution of False and True Positive
data_test_predic<-cbind(data_test,profilepredclass=predsrfsamp,profilepredprob=predsrprofilerobsamp)
profile_incorrect<-data_test_predic %>% filter(profile=='Normal' & profilepredclass=='Malicious')
profile_correct<-data_test_predic %>% filter(profile=='Normal' & profilepredclass=='Normal')
histogram(~ratio_avg_samedomain,data=profile_correct,main="Distribution of Avg_samedomain for True positives ")

#histogram(~ratio_avg_samedomain,data=profile_incorrect,main="Distribution of Avg_samedomain for False positives ")
histogram(~tot_requests,data=profile_correct,main="Distribution of Total Requests for True positives ")

#histogram(~tot_requests,data=profile_incorrect,main="Distribution of Total Requests for False positives")
Per request evaluation
Despites the good results of the client profile classifier showed in the previous secctions, we need to analize it in the context of the DNS request. Therefore, a per request evaluation is performed. THe approach followed for performing such evaluation was the following:
- We considered only the profiles in the testfile
- For each profile, we select all the DNS requests and labeled accordingly.
- For a profile labeled as Normal, all DNS requests are considered Normal
- For a profile labeled as DGA we label only those request labeled as DGA by the NN. The remaining Resquests are note labeled and considered Background
a<-request_labeled %>% group_by(client,profilenum) %>% summarise(n=n())
b<-data_test %>% group_by(client,profilenum) %>% summarise(n=sum(tot_requests))
as.data.frame(c(a,b)) %>% filter(n!=n.1)
legit_labeled %>% filter(dga.class==1)
request_labeled %>% filter(dga.class==1 & GroundTruthLabel=='Normal') %>% select(profilepredclass) %>% group_by(profilepredclass) %>% summarise(n=n())
request_labeled %>% filter(dga.class==1 & GroundTruthLabel=='Normal') %>% group_by(client,profilenum) %>% summarise(n=n())
dga_data_test %>% filter(profilenum==0)
Number of labeled requests: 7688. THe labels of the requests are distributed according to the Table below.
nlabels<-request_labeled%>% filter(!is.na(GroundTruthLabel)) %>% group_by(GroundTruthLabel) %>% summarise(total=n())
nlabels
As can be see, a total of 7609 Normal request are available while the total of Not Normal requests is79
NN per request Analysis
The distribution of the GroundTruthLabel classified as DGA by the NN is showed in the Table below.
nndetected<-request_labeled %>% filter(!is.na(GroundTruthLabel) & dga.class==1) %>% group_by(GroundTruthLabel, dga.class) %>% summarise(nntotal=n())
nndetected
In this case, the total of False Positive requests from NN classifier is 3031.
Client Profile per request Analysis
The distribution of the GroundTruthLabel classified as DGA by the Profile is showed in the Table below.
profiledetected<-request_labeled %>% filter(!is.na(GroundTruthLabel) & profilepredclass=='Malicious') %>% group_by(GroundTruthLabel, profilepredclass) %>% summarise(profiletotal=n())
profiledetected
In this case, the total of False Positive requests from profile classifier is 2655. That means the profile has reduced the number of False Positive in 376.
Such results are shown in the next barplot. In gray is the total amount of Normal record, in blue the total amount of Normal requests detected as DGA by the NN. Finally, in Orange the total amount of Malicious requests according to the client profile.
results<-cbind(nlabels,profiletotal=profiledetected$profiletotal,nntotal=nndetected$nntotal)
results<-results %>% mutate(GroundTruthLabel=ifelse(GroundTruthLabel=='Normal','Normal','Not Normal'))
ggplot(as.data.frame(results))+
geom_col(aes(x=GroundTruthLabel,y=total),fill='lightgray')+
geom_col(aes(x=GroundTruthLabel,y=nntotal),fill='skyblue',alpha=0.5)+
geom_col(aes(x=GroundTruthLabel,y=profiletotal),fill='orange',alpha=0.7)+
theme_bw()

Conclusions
In the first part of the evaluation, when we were only considering the client profiles, the classifier has correctly detected all the FP (i.e. Normal profile uncorrectly detected by the NN). An Specificity of 1 and a Sensitivity of 0.8 was showed. However, the Mcnemar’s Test P-Value resulted in 0.07364, that means results should not considered significant. Perhaps more Profile labeled are going to be needed.
Regarding the per request evaluation, in general the client profile has reduced the FP of the NN by only a ~15%. In those Normal profiles detected by the NN as DGA, the profile classifier has reduced considerable the FP. However, the problem seems to be in the case of those Malicious profiles. A malicious profile can have only a few malicious requests mixed with several normal requests. In those cases, obviously the profile classifier will classify everything as Malicious, resulting in a poor performance.
The mentioned problem seems to be a dificult one, since the profile is labeled as malicious if it contains at least only one confirmed DGA requests. It doesn’t matter if there ara housands of normal requests in the same profile. Moreover, the selected features allow the RandomFOrest algorithm to detect correctly the presence of this small malicious behavior in a profile, but it knows nothing about the remaining normal requests.
Given that the presence of DGA could be small during a 1 hour time period, a possible solution could be to use smaller windows.
saving the model
