The goal of the study

Evaluate the prediction based on the difference between number of YES votes and NO votes. For more detailed explanation, please see the previous analysis

Whe study has two goals:
  • Train a machine learning algorithm that predicts whether a code fragment is related to a failure or not. For that, I originally devised different metrics. The metric that will explore in the following study consists of Majority vote between YES and NO answers.
  • Building the model

    I chose knn.cv (cross validation) so I can minimize the risk of lucky selection of training and testing set.

    Cross validations was performed by leaving one out

    #build model
    fitModel.cv <- knn.cv(trainingData, trainingData$bugCovering, k=3, l=0, prob = FALSE, use.all=TRUE);

    I have also run with differnt levels of k=3,5,7,9, which produced similar results.

    Testing the model

    ## 
    ##  
    ##    Cell Contents
    ## |-------------------------|
    ## |                       N |
    ## |           N / Row Total |
    ## |           N / Col Total |
    ## |         N / Table Total |
    ## |-------------------------|
    ## 
    ##  
    ## Total Observations in Table:  129 
    ## 
    ##  
    ##                          | fitModel.cv.df[, 1] 
    ## trainingData$bugCovering |     FALSE |      TRUE | Row Total | 
    ## -------------------------|-----------|-----------|-----------|
    ##                    FALSE |        97 |         7 |       104 | 
    ##                          |     0.933 |     0.067 |     0.806 | 
    ##                          |     0.915 |     0.304 |           | 
    ##                          |     0.752 |     0.054 |           | 
    ## -------------------------|-----------|-----------|-----------|
    ##                     TRUE |         9 |        16 |        25 | 
    ##                          |     0.360 |     0.640 |     0.194 | 
    ##                          |     0.085 |     0.696 |           | 
    ##                          |     0.070 |     0.124 |           | 
    ## -------------------------|-----------|-----------|-----------|
    ##             Column Total |       106 |        23 |       129 | 
    ##                          |     0.822 |     0.178 |           | 
    ## -------------------------|-----------|-----------|-----------|
    ## 
    ## 

    Estimating the metric

    Discover the minimal majority vote value that would have predicted the same bug Covering questions

    Mean majority vote of the questions categorized as bug covering:

    ## [1] 3.130435

    Minimal majority vote of the questions categorized as bug covering:

    ## [1] -2

    Plot metric distribution

    By the distribution of majority vote outcomes values, we can note that the metric value for Majority vote has to be larger or equal to -2 (minus two) in order predict bug-covering questions.