The goal of the study

Evaluate the prediction based on a minimal number of YES votes to consider a question to be bug-covering. For more detailed explanation, please see the previous analysis

Whe study has two goals:
  • Train a machine learning algorithm that predicts whether a code fragment is related to a failure or not. For that, I originally devised different metrics. The metric that will explore in the following study consists of threshold vote of YES answers.
  • Building the model

    I chose knn.cv (cross validation) so I can minimize the risk of lucky selection of training and testing set.

    Cross validations was performed by leaving one out

    #build model
    fitModel.cv <- knn.cv(trainingData, trainingData$bugCovering, k=3, l=0, prob = FALSE, use.all=TRUE);

    I have also run with differnt levels of k=3,5,7,9, which produced similar results.

    Testing the model

    ## 
    ##  
    ##    Cell Contents
    ## |-------------------------|
    ## |                       N |
    ## |           N / Row Total |
    ## |           N / Col Total |
    ## |         N / Table Total |
    ## |-------------------------|
    ## 
    ##  
    ## Total Observations in Table:  129 
    ## 
    ##  
    ##                          | fitModel.cv.df[, 1] 
    ## trainingData$bugCovering |     FALSE |      TRUE | Row Total | 
    ## -------------------------|-----------|-----------|-----------|
    ##                    FALSE |       103 |         1 |       104 | 
    ##                          |     0.990 |     0.010 |     0.806 | 
    ##                          |     0.981 |     0.042 |           | 
    ##                          |     0.798 |     0.008 |           | 
    ## -------------------------|-----------|-----------|-----------|
    ##                     TRUE |         2 |        23 |        25 | 
    ##                          |     0.080 |     0.920 |     0.194 | 
    ##                          |     0.019 |     0.958 |           | 
    ##                          |     0.016 |     0.178 |           | 
    ## -------------------------|-----------|-----------|-----------|
    ##             Column Total |       105 |        24 |       129 | 
    ##                          |     0.814 |     0.186 |           | 
    ## -------------------------|-----------|-----------|-----------|
    ## 
    ## 

    Estimating the metric

    Discover the minimal threshold vote value that would have predicted the same bug Covering questions

    Mean threshold vote of the questions categorized as bug covering:

    ## [1] 9.416667

    Minimal threshold vote of the questions categorized as bug covering:

    ## [1] 6

    Plot metric distribution

    By the distribution of threshold vote outcomes values, we can note that the metric value for threshold vote has to be larger or equal to 6 (six) in order predict bug-covering questions.