Week 2B Classification Metrics Code Base submission
Author
Long Lin
Week 2B Classification Metrics: Overview
For this assignment, I used the data provided to calculate the null error rate and create a plot showing the distribution of the actual class. I explain why the null error rate is important to know when evaluating models.
Then I calculated the True Positives, False Positives, True Negatives, and False Negatives using different probability thresholds of 0.2, 0.5, and 0.8. These results will be presented in three confusion matrices.
Next, I calculate the accuracy, precision, recall, and F1 score for each threshold and display them in a clear table.
At the end, I gave a couple real-world scenarios where a 0.2 threshold is preferable and where a 0.8 threshold is preferable.
Data Source: https://raw.githubusercontent.com/acatlin/data/refs/heads/master/penguin_predictions.csv
Calculating Null Error Rate
Null Error Rate is a baseline metric that indicates how often you would be wrong if you always predicted the majority class.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.0 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.5.2
✔ ggplot2 4.0.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
class_counts <-table(df$sex)# Find the majority class count and total countmajority_count <-max(class_counts)total_count <-sum(class_counts)# Calculate the proportion of the majority classmajority_proportion <- majority_count / total_count# The null error rate is 1 minus the majority proportionnull_error_rate <-1- majority_proportion# Print the resultprint(paste("Null Error Rate:", null_error_rate))
[1] "Null Error Rate: 0.419354838709677"
library(ggplot2)ggplot(df, aes(x = sex, fill = sex)) +geom_bar() +geom_text(aes(label =after_stat(count)), stat ="count", vjust =-0.5) +labs(title ="Distribution of Sex", x ="Sex", y ="Count") +theme_minimal()
Here is a plot showing the actual distribution of the sex class.
Knowing the null error rate is important when evaluating models because it gives you a baseline which you can use to compare your model with, in order to determine if your model is just guessing the majority class or actually learning patterns.
Calculations for 0.2 threshold
First we’ll start with calculating the True Positives, True Negatives, False Positives, and False Negatives for 0.2 threshold.
# Defined thresholdthreshold <-0.2# Function to calculate True Positivesget_true_positives <-function(p, y, t) { preds <-ifelse(p >= t, 1, 0)sum(preds ==1& y ==1)}# True Positives for specified thresholdtrue_positives_02 <-get_true_positives(df$.pred_female, ifelse(df$sex =="female", 1, 0), threshold)# Function to calculate True Negativesget_true_negatives <-function(p, y, t) { preds <-ifelse(p >= t, 1, 0)sum(preds ==0& y ==0)}# True Negatives for specified thresholdtrue_negatives_02 <-get_true_negatives(df$.pred_female, ifelse(df$sex =="female", 1, 0), threshold)# Function to calculate False Positivesget_false_positives <-function(p, y, t) { preds <-ifelse(p >= t, 1, 0)sum(preds ==1& y ==0)}# False Positives for specified thresholdfalse_positives_02 <-get_false_positives(df$.pred_female, ifelse(df$sex =="female", 1, 0), threshold)# Function to calculate False Negativesget_false_negatives <-function(p, y, t) { preds <-ifelse(p >= t, 1, 0)sum(preds ==0& y ==1)}# False Negatives for specified thresholdfalse_negatives_02 <-get_false_negatives(df$.pred_female, ifelse(df$sex =="female", 1, 0), threshold)# Create vectors for items and valuesitem_names <-c("True Positives", "True Negatives", "False Positives", "False Negatives")item_values <-c(true_positives_02, true_negatives_02, false_positives_02, false_negatives_02)# Combine them into a data frameitem_table <-data.frame(Item = item_names, Value = item_values)# Display the tableprint(item_table)
actual_values <-ifelse(df$sex =="female", 1, 0)predicted_probs <- df$.pred_female# Defined thresholdthreshold <-0.8# Convert probabilities to binary class predictions using the threshold# Values >= threshold become 1, otherwise 0predicted_values <-ifelse(predicted_probs >= threshold, 1, 0)# Create the confusion tableconfusion_matrix_08 <-table(predicted_values, actual_values)print(confusion_matrix_08)
actual_values
predicted_values 0 1
0 52 3
1 2 36
Calculating Performance Metrics for 0.8 threshold
Calculating the accuracy, precision, recall, f1 score for 0.8 threshold
correct_predictions_08 <- (true_positives_08 + true_negatives_08)# Total number of observationstotal_observations_08 <-sum(true_positives_08 + true_negatives_08 + false_positives_08 + false_negatives_08)# Calculate accuracyaccuracy_08 <- correct_predictions_08 / total_observations_08# Calculate the precisionprecision_08 <- true_positives_08 / (true_positives_08 + false_positives_08)# Calculate the recallrecall_08 <- true_positives_08 / (true_positives_08 + false_negatives_08)# Calculate the f1 scoref1_score_08 <-2* (precision_08 * recall_08) / (precision_08 + recall_08)# Create vectors for performance metricsperformance_metrics_names <-c("Accuracy", "Precision", "Recall", "F1 Score")performance_metrics_values <-c(accuracy_08, precision_08, recall_08, f1_score_08)# Combine them into a data frameperformance_metrics_table <-data.frame("Performance Metric"= performance_metrics_names, Value = performance_metrics_values)# Display the tableprint(performance_metrics_table)
Performance.Metric Value
1 Accuracy 0.9462366
2 Precision 0.9473684
3 Recall 0.9230769
4 F1 Score 0.9350649
Threshold Use Cases
One real world scenario where a 0.2 threshold is preferable is cancer screening because missing a positive case (false negative) is much more dangerous than flagging a negative case (false positive).
One real world scenario where a 0.8 threshold is preferable is credit card fraud detection because credit card companies want to reduce how often false positives happen. Otherwise customers may not be happy with the constant declines when they try to make a purchase that are slightly unusual.