Evaluation of Models

Heart Disease KNN

Class Split

## 
##    0    1 
## 45.5 54.5

Confusion Matrix

despite somewhat high accuracy, we observe an imbalance in our sensitivity and specificity, with specificity being considerably lower.

## Confusion Matrix and Statistics
## 
##           Actual
## Prediction  0  1
##          0 23  3
##          1  7 28
##                                           
##                Accuracy : 0.8361          
##                  95% CI : (0.7191, 0.9185)
##     No Information Rate : 0.5082          
##     P-Value [Acc > NIR] : 9.418e-08       
##                                           
##                   Kappa : 0.6713          
##                                           
##  Mcnemar's Test P-Value : 0.3428          
##                                           
##             Sensitivity : 0.9032          
##             Specificity : 0.7667          
##          Pos Pred Value : 0.8000          
##          Neg Pred Value : 0.8846          
##              Prevalence : 0.5082          
##          Detection Rate : 0.4590          
##    Detection Prevalence : 0.5738          
##       Balanced Accuracy : 0.8349          
##                                           
##        'Positive' Class : 1               
##

F1 Score

The F1 score provides us with an idea of the balance between precision and recall in the model. Given that this value is somewhat close to 1, this shows that we have a somewhat strong balance, and are rarely missclassfying.

## [1] 0.8214286

ROC/AUC

## 'data.frame':    61 obs. of  3 variables:
##  $ pred_class: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ pred_prob : num  0.556 0.778 0.778 0.889 1 ...
##  $ target    : num  2 2 2 2 2 2 2 2 2 2 ...

## [[1]]
## [1] 0.9247312

Prediction probability breakdown

DT:: datatable(heart_eval2)

Log Loss

## [1] -0.138411

While we were unable to discern why this value was occuring, we can at least affirm that this is an error, and does not reflect the true log loss.

Evaluation Analysis (Part 2)

By viewing the confusion matrix above, it is clear by the predictive rate for the false values is considerably lower than the one for positive values. By viewing the table with targets and predictions in accordance with each other, it is clear that the prediction probabilities for false negatives all have similar probabilities, with many of them having prediction probabilities of 43% or 28%. Perhaps the true negative rate is being lowered by the fact that the training set contained a higher proportion of positive values and thus, the model was not given an ample amount rows to train for predicting negative cases. This is a wobbly assumption though, as the baserate for the postitive class in the dataset is only slightly skewed at 54%. Because we are trying to predict a medical condition, we should prefer a model with higher levels of specificity, so as to minimize the number of false negatives. Given this pattern of many false negative prediction probabilities often hovering between 28%-43%, it would make sense to raise the classification threshold. This will obviously lead to higher false positives, but given the real-life implications of the model being studied, this is an acceptable compromise.

Threshold Change(Part 3)

adjust_thres <- function(x, y, z) {
  #x=pred_probablities, y=threshold, z=test_outcome
  thres <- as.factor(ifelse(x > y, 1,0))
  confusionMatrix(thres, z, positive = "1", dnn=c("Prediction", "Actual"), mode = "everything")
}


adjust_thres(heart_eval_prob$`1`,.64, heart_test$num)

## Confusion Matrix and Statistics
## 
##           Actual
## Prediction  0  1
##          0 27  6
##          1  3 25
##                                           
##                Accuracy : 0.8525          
##                  95% CI : (0.7383, 0.9302)
##     No Information Rate : 0.5082          
##     P-Value [Acc > NIR] : 1.821e-08       
##                                           
##                   Kappa : 0.7053          
##                                           
##  Mcnemar's Test P-Value : 0.505           
##                                           
##             Sensitivity : 0.8065          
##             Specificity : 0.9000          
##          Pos Pred Value : 0.8929          
##          Neg Pred Value : 0.8182          
##               Precision : 0.8929          
##                  Recall : 0.8065          
##                      F1 : 0.8475          
##              Prevalence : 0.5082          
##          Detection Rate : 0.4098          
##    Detection Prevalence : 0.4590          
##       Balanced Accuracy : 0.8532          
##                                           
##        'Positive' Class : 1               
##

# betters our specificity but decreases overall accuracy and sensitivity

Motorcycle market knn

Class Split

## 
##    0    1 
## 73.6 26.4

Confusion Matrix

## Confusion Matrix and Statistics
## 
##           Actual
## Prediction   0   1
##          0 137  17
##          1   7  27
##                                          
##                Accuracy : 0.8723         
##                  95% CI : (0.816, 0.9165)
##     No Information Rate : 0.766          
##     P-Value [Acc > NIR] : 0.000182       
##                                          
##                   Kappa : 0.6134         
##                                          
##  Mcnemar's Test P-Value : 0.066193       
##                                          
##             Sensitivity : 0.6136         
##             Specificity : 0.9514         
##          Pos Pred Value : 0.7941         
##          Neg Pred Value : 0.8896         
##              Prevalence : 0.2340         
##          Detection Rate : 0.1436         
##    Detection Prevalence : 0.1809         
##       Balanced Accuracy : 0.7825         
##                                          
##        'Positive' Class : 1              
##

despite somewhat high accuracy, we observe an imbalance in our sensitivity and specificity, with sensitivity being considerably lower. This is a major problem given the objective of our model.

F1 Score

## [1] 0.9194631

ROC/AUC

## 'data.frame':    188 obs. of  3 variables:
##  $ pred_class: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ pred_prob : num  0 0.333 0.3 0 0.1 ...
##  $ target    : num  0 1 1 0 0 0 0 0 1 0 ...

AUC Performance

print(tree_perf_AUC2@y.values)

## [[1]]
## [1] 0.9247312

Prediction probability table

DT:: datatable(Bike_eval2)

LogLoss

## [1] 0.4685741

Part 2B

Given that the variable that the Bike KNN model is hinged upon originally had 4 levels rather than two, it is obvious that the low sensitivity value of 0.5556 comes from the the training set being highly overweighted with zero values. this is especially clear when considering the prediction probabilities for false negatives, which are often close to 0.555. This could be partially alleviated by taking a random set of rows for the training set and making sure the base case probability among those rows is non-random, namely at 50%. This could also be alleviated by lowering the decision threshold to 0.3.

Part 3B

## Confusion Matrix and Statistics
## 
##           Actual
## Prediction   0   1
##          0 121   8
##          1  23  36
##                                           
##                Accuracy : 0.8351          
##                  95% CI : (0.7742, 0.8851)
##     No Information Rate : 0.766           
##     P-Value [Acc > NIR] : 0.01324         
##                                           
##                   Kappa : 0.5888          
##                                           
##  Mcnemar's Test P-Value : 0.01192         
##                                           
##             Sensitivity : 0.8182          
##             Specificity : 0.8403          
##          Pos Pred Value : 0.6102          
##          Neg Pred Value : 0.9380          
##               Precision : 0.6102          
##                  Recall : 0.8182          
##                      F1 : 0.6990          
##              Prevalence : 0.2340          
##          Detection Rate : 0.1915          
##    Detection Prevalence : 0.3138          
##       Balanced Accuracy : 0.8292          
##                                           
##        'Positive' Class : 1               
##

Threshold is lowered to 0.24

PART 4

Overall, our findings indicate that both models can offer a prediction on classification that is at least, better than random chance guessing. However, oversampling and undersampling were issues in both datasets, signifying that in the future, steps should be taken to balance the datasets more acutely before any further iterations are pursued. In recognizing how pivotal the threshold values are for overall accuracy, specificity and sensitivity values, we need to be aware that the first iteration of the model can almost never be a fair estimation of what’s truly happening in the data, and because most models rely on real world data that can drift, it is worth being aware of the fact that the execution of a useful model comes well before the execution of the code, and rather starts and continues with data souring that’s efficient and balanced among classes.

ML Evaluation Lab

William Cull, Jay Ralyea, John Hope

4/14/2021

Evaluation of Models

Heart Disease KNN

Class Split

Confusion Matrix

F1 Score

ROC/AUC

Prediction probability breakdown

Log Loss

Evaluation Analysis (Part 2)

Threshold Change(Part 3)

Motorcycle market knn

Class Split

Confusion Matrix

F1 Score

ROC/AUC

AUC Performance

Prediction probability table

LogLoss

Part 2B

Part 3B

PART 4