In this paper we go through a predictor model and analyse its performance metrics
This metric lets us know how often we would be wrong if we always predicted the majority class. It gives us a good baseline in predicting the accuracy of a model. If a model’s accuracy is lower than or close to the null error rate, it means the model isn’t doing better than just guessing the majority class. Knowing the majority class also helps us identify how imbalanced the datatype is to see which other performance metrics might be of a greater use.
Formula : 1 - (Majority class count / Total count )
We load the data first from github, calculate the totals to find the majority and then calculate the Null Error Rate
## # A tibble: 6 × 3
## .pred_female .pred_class sex
## <dbl> <chr> <chr>
## 1 0.992 female female
## 2 0.954 female female
## 3 0.985 female female
## 4 0.187 male female
## 5 0.995 female female
## 6 1.00 female female
## Total females: 39
## Total males: 54
## Null Error Rate: 0.4193548
This lets us know that if we create a predictor that just picks the majority male everytime we would be wrong 41.9% of the time.
We use confusion matrices to plot True and False positives and negatives. In this scenario we assume that the sex , prediction being female as postive values. And the sex / prediction being male as negative values.
At the default threshold of 0.5
## Predicted
## Actual Male Female
## Male 51 3
## Female 3 36
## Accuracy: 0.9354839 , Precision: 0.9230769 , recall: 0.9230769 and F1: 0.9230769
Here we alter the predictor to predict female when the .pred_female is over 0.2 instead of 0.5
## Predicted
## Actual Male Female
## Male 48 6
## Female 2 37
## Accuracy: 0.9139785 , Precision: 0.8604651 , recall: 0.9487179 and F1: 0.902439
Here we alter the predictor to predict female when the .pred_female is over 0.8 instead of 0.5
## Predicted
## Actual Male Female
## Male 52 2
## Female 3 36
## Accuracy: 0.9462366 , Precision: 0.9473684 , recall: 0.9230769 and F1: 0.9350649
Now we can compare the effect the different threshold values have on other metrics
## Threshold Accuracy Precision Recall F1_score
## 1 0.5 0.9354839 0.9230769 0.9230769 0.9230769
## 2 0.2 0.9139785 0.8604651 0.9487179 0.9024390
## 3 0.8 0.9462366 0.9473684 0.9230769 0.9350649
At 0.2 threshold we can notice that that in this dataset we have a lot more False positives with penguins being classified as females when theyre actualy males. But we also have more true positives with females actually being predicted as females. This would be a good moddel if its more beneficial in predicting false positives to lessen risk factors for eg in a health condition predictor where we would rather have a false positive than a False negative m where someone with the condition is predicted as not having it.
At 0.8 threshhold we get more True negatives where males are being accurately predicted as males. Combining this with the similar number of True positives, this case gives us the highest accuracy. With a more balanced dataset, the accuracy gives a good measure on understanding the quality of the predictor. This also would be beneficial in a situation where predicting true negatives are more important than finding true positives.