Evaluation with confusion matrix

R Markdown

pred <-  as.factor(preds$Prediction..isAlive.)
actual <- as.factor(preds$isAlive)

conf_matrix <- confusionMatrix(pred, actual)
conf_matrix

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  No Yes
##        No  227  82
##        Yes  81 527
##                                           
##                Accuracy : 0.8222          
##                  95% CI : (0.7959, 0.8465)
##     No Information Rate : 0.6641          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.6019          
##                                           
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.7370          
##             Specificity : 0.8654          
##          Pos Pred Value : 0.7346          
##          Neg Pred Value : 0.8668          
##              Prevalence : 0.3359          
##          Detection Rate : 0.2475          
##    Detection Prevalence : 0.3370          
##       Balanced Accuracy : 0.8012          
##                                           
##        'Positive' Class : No              
##

Interpretation

Here are in our opinion the most significant statistics from confusion matrix of our prediction model (decision tree):

Accuracy of the prediction was 82.22%, which is not ideal, but still can be considered as a high one.

No information rate tests whether our classifier does better than random assignment, but we can see that he accuracy is greater than it, so the classifier is okey.

Kappa statistics is a measure of how the classification results compare to values assigned by chance. The value of 0.602 is still the moderate agreement but from 0.61 the value can be considered substantial, so in here we can say that the agreement is almost substantial.

Sensitivity of a model is the number of correct positive predictions divided by the total number of positives. Our model’s predictions are in most cases positive.

Specifity is similar to sensitivity by instead of positive it describes negative predictions.

From this plot its easy to see that most of the predictions made by the model were accurate, but number of wrong predictions is still noticable. For now most of the characters in the data set are alive and will probabli survive in the nearest future.