Evaluate the prediction based on a minimal number of YES votes to consider a question to be bug-covering. For more detailed explanation, please see the previous analysis
I chose knn.cv (cross validation) so I can minimize the risk of lucky selection of training and testing set.
Cross validations was performed by leaving one out
#build model
fitModel.cv <- knn.cv(trainingData, trainingData$bugCovering, k=3, l=0, prob = FALSE, use.all=TRUE);
I have also run with differnt levels of k=3,5,7,9, which produced similar results.
##
##
## Cell Contents
## |-------------------------|
## | N |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 129
##
##
## | fitModel.cv.df[, 1]
## trainingData$bugCovering | FALSE | TRUE | Row Total |
## -------------------------|-----------|-----------|-----------|
## FALSE | 103 | 1 | 104 |
## | 0.990 | 0.010 | 0.806 |
## | 0.981 | 0.042 | |
## | 0.798 | 0.008 | |
## -------------------------|-----------|-----------|-----------|
## TRUE | 2 | 23 | 25 |
## | 0.080 | 0.920 | 0.194 |
## | 0.019 | 0.958 | |
## | 0.016 | 0.178 | |
## -------------------------|-----------|-----------|-----------|
## Column Total | 105 | 24 | 129 |
## | 0.814 | 0.186 | |
## -------------------------|-----------|-----------|-----------|
##
##
Discover the minimal threshold vote value that would have predicted the same bug Covering questions
Mean threshold vote of the questions categorized as bug covering:
## [1] 9.416667
Minimal threshold vote of the questions categorized as bug covering:
## [1] 6
By the distribution of threshold vote outcomes values, we can note that the metric value for threshold vote has to be larger or equal to 6 (six) in order predict bug-covering questions.