Evaluate the prediction based on the difference between number of YES votes and NO votes. For more detailed explanation, please see the previous analysis
I chose knn.cv (cross validation) so I can minimize the risk of lucky selection of training and testing set.
Cross validations was performed by leaving one out
#build model
fitModel.cv <- knn.cv(trainingData, trainingData$bugCovering, k=3, l=0, prob = FALSE, use.all=TRUE);
I have also run with differnt levels of k=3,5,7,9, which produced similar results.
##
##
## Cell Contents
## |-------------------------|
## | N |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 129
##
##
## | fitModel.cv.df[, 1]
## trainingData$bugCovering | FALSE | TRUE | Row Total |
## -------------------------|-----------|-----------|-----------|
## FALSE | 97 | 7 | 104 |
## | 0.933 | 0.067 | 0.806 |
## | 0.915 | 0.304 | |
## | 0.752 | 0.054 | |
## -------------------------|-----------|-----------|-----------|
## TRUE | 9 | 16 | 25 |
## | 0.360 | 0.640 | 0.194 |
## | 0.085 | 0.696 | |
## | 0.070 | 0.124 | |
## -------------------------|-----------|-----------|-----------|
## Column Total | 106 | 23 | 129 |
## | 0.822 | 0.178 | |
## -------------------------|-----------|-----------|-----------|
##
##
Discover the minimal majority vote value that would have predicted the same bug Covering questions
Mean majority vote of the questions categorized as bug covering:
## [1] 3.130435
Minimal majority vote of the questions categorized as bug covering:
## [1] -2
By the distribution of majority vote outcomes values, we can note that the metric value for Majority vote has to be larger or equal to -2 (minus two) in order predict bug-covering questions.