Week 2B Classification Metrics Code Base submission

Author

Long Lin

Week 2B Classification Metrics: Overview

For this assignment, I used the data provided to calculate the null error rate and create a plot showing the distribution of the actual class. I explain why the null error rate is important to know when evaluating models.

Then I calculated the True Positives, False Positives, True Negatives, and False Negatives using different probability thresholds of 0.2, 0.5, and 0.8. These results will be presented in three confusion matrices.

Next, I calculate the accuracy, precision, recall, and F1 score for each threshold and display them in a clear table.

At the end, I gave a couple real-world scenarios where a 0.2 threshold is preferable and where a 0.8 threshold is preferable.

Data Source: https://raw.githubusercontent.com/acatlin/data/refs/heads/master/penguin_predictions.csv

Calculating Null Error Rate

Null Error Rate is a baseline metric that indicates how often you would be wrong if you always predicted the majority class.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.5.2
✔ ggplot2   4.0.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

url <- "https://raw.githubusercontent.com/acatlin/data/refs/heads/master/penguin_predictions.csv"

df <- read_csv(
  file = url,
  show_col_types = FALSE,
  progress = FALSE
)

class_counts <- table(df$sex)

# Find the majority class count and total count
majority_count <- max(class_counts)
total_count <- sum(class_counts)

# Calculate the proportion of the majority class
majority_proportion <- majority_count / total_count

# The null error rate is 1 minus the majority proportion
null_error_rate <- 1 - majority_proportion

# Print the result
print(paste("Null Error Rate:", null_error_rate))

[1] "Null Error Rate: 0.419354838709677"

library(ggplot2)
ggplot(df, aes(x = sex, fill = sex)) +
  geom_bar() +
  geom_text(aes(label = after_stat(count)), stat = "count", vjust = -0.5) +
  labs(title = "Distribution of Sex", x = "Sex", y = "Count") +
  theme_minimal()

Here is a plot showing the actual distribution of the sex class.

Knowing the null error rate is important when evaluating models because it gives you a baseline which you can use to compare your model with, in order to determine if your model is just guessing the majority class or actually learning patterns.

Calculations for 0.2 threshold

First we’ll start with calculating the True Positives, True Negatives, False Positives, and False Negatives for 0.2 threshold.

# Defined threshold
threshold <- 0.2

# Function to calculate True Positives
get_true_positives <- function(p, y, t) {
  preds <- ifelse(p >= t, 1, 0)
  sum(preds == 1 & y == 1)
}

# True Positives for specified threshold
true_positives_02 <- get_true_positives(df$.pred_female, ifelse(df$sex == "female", 1, 0), threshold)

# Function to calculate True Negatives
get_true_negatives <- function(p, y, t) {
  preds <- ifelse(p >= t, 1, 0)
  sum(preds == 0 & y == 0)
}

# True Negatives for specified threshold
true_negatives_02 <- get_true_negatives(df$.pred_female, ifelse(df$sex == "female", 1, 0), threshold)

# Function to calculate False Positives
get_false_positives <- function(p, y, t) {
  preds <- ifelse(p >= t, 1, 0)
  sum(preds == 1 & y == 0)
}

# False Positives for specified threshold
false_positives_02 <- get_false_positives(df$.pred_female, ifelse(df$sex == "female", 1, 0), threshold)

# Function to calculate False Negatives
get_false_negatives <- function(p, y, t) {
  preds <- ifelse(p >= t, 1, 0)
  sum(preds == 0 & y == 1)
}

# False Negatives for specified threshold
false_negatives_02 <- get_false_negatives(df$.pred_female, ifelse(df$sex == "female", 1, 0), threshold)

# Create vectors for items and values
item_names <- c("True Positives", "True Negatives", "False Positives", "False Negatives")
item_values <- c(true_positives_02, true_negatives_02, false_positives_02, false_negatives_02)

# Combine them into a data frame
item_table <- data.frame(Item = item_names, Value = item_values)

# Display the table
print(item_table)

             Item Value
1  True Positives    37
2  True Negatives    48
3 False Positives     6
4 False Negatives     2

Creating a Confusion Matrix for 0.2 threshold

actual_values <- ifelse(df$sex == "female", 1, 0)
predicted_probs <- df$.pred_female

# Defined threshold
threshold <- 0.2

# Convert probabilities to binary class predictions using the threshold
# Values >= threshold become 1, otherwise 0
predicted_values <- ifelse(predicted_probs >= threshold, 1, 0)

# Create the confusion table
confusion_matrix_02 <- table(predicted_values, actual_values)
print(confusion_matrix_02)

                actual_values
predicted_values  0  1
               0 48  2
               1  6 37

Calculating Performance Metrics for 0.2 threshold

Calculating the accuracy, precision, recall, f1 score for 0.2 threshold

correct_predictions_02 <- (true_positives_02 + true_negatives_02)

# Total number of observations
total_observations_02 <- sum(true_positives_02 + true_negatives_02 + false_positives_02 + false_negatives_02)

# Calculate accuracy
accuracy_02 <- correct_predictions_02 / total_observations_02

# Calculate the precision
precision_02 <- true_positives_02 / (true_positives_02 + false_positives_02)

# Calculate the recall
recall_02 <- true_positives_02 / (true_positives_02 + false_negatives_02)

# Calculate the f1 score
f1_score_02 <- 2 * (precision_02 * recall_02) / (precision_02 + recall_02)

# Create vectors for performance metrics
performance_metrics_names <- c("Accuracy", "Precision", "Recall", "F1 Score")
performance_metrics_values <- c(accuracy_02, precision_02, recall_02, f1_score_02)

# Combine them into a data frame
performance_metrics_table <- data.frame("Performance Metric" = performance_metrics_names, Value = performance_metrics_values)

# Display the table
print(performance_metrics_table)

  Performance.Metric     Value
1           Accuracy 0.9139785
2          Precision 0.8604651
3             Recall 0.9487179
4           F1 Score 0.9024390

Calculations for 0.5 threshold

First we’ll start with calculating the True Positives, True Negatives, False Positives, and False Negatives for 0.5 threshold.

# Defined threshold
threshold <- 0.5

# Function to calculate True Positives
get_true_positives <- function(p, y, t) {
  preds <- ifelse(p >= t, 1, 0)
  sum(preds == 1 & y == 1)
}

# True Positives for specified threshold
true_positives_05 <- get_true_positives(df$.pred_female, ifelse(df$sex == "female", 1, 0), threshold)

# Function to calculate True Negatives
get_true_negatives <- function(p, y, t) {
  preds <- ifelse(p >= t, 1, 0)
  sum(preds == 0 & y == 0)
}

# True Negatives for specified threshold
true_negatives_05 <- get_true_negatives(df$.pred_female, ifelse(df$sex == "female", 1, 0), threshold)

# Function to calculate False Positives
get_false_positives <- function(p, y, t) {
  preds <- ifelse(p >= t, 1, 0)
  sum(preds == 1 & y == 0)
}

# False Positives for specified threshold
false_positives_05 <- get_false_positives(df$.pred_female, ifelse(df$sex == "female", 1, 0), threshold)

# Function to calculate False Negatives
get_false_negatives <- function(p, y, t) {
  preds <- ifelse(p >= t, 1, 0)
  sum(preds == 0 & y == 1)
}

# False Negatives for specified threshold
false_negatives_05 <- get_false_negatives(df$.pred_female, ifelse(df$sex == "female", 1, 0), threshold)

# Create vectors for items and values
item_names <- c("True Positives", "True Negatives", "False Positives", "False Negatives")
item_values <- c(true_positives_05, true_negatives_05, false_positives_05, false_negatives_05)

# Combine them into a data frame
item_table <- data.frame(Item = item_names, Value = item_values)

# Display the table
print(item_table)

             Item Value
1  True Positives    36
2  True Negatives    51
3 False Positives     3
4 False Negatives     3

Creating a Confusion Matrix for 0.5 threshold

actual_values <- ifelse(df$sex == "female", 1, 0)
predicted_probs <- df$.pred_female

# Defined threshold
threshold <- 0.5

# Convert probabilities to binary class predictions using the threshold
# Values >= threshold become 1, otherwise 0
predicted_values <- ifelse(predicted_probs >= threshold, 1, 0)

# Create the confusion table
confusion_matrix_05 <- table(predicted_values, actual_values)
print(confusion_matrix_05)

                actual_values
predicted_values  0  1
               0 51  3
               1  3 36

Calculating Performance Metrics for 0.5 threshold

Calculating the accuracy, precision, recall, f1 score for 0.5 threshold

correct_predictions_05 <- (true_positives_05 + true_negatives_05)

# Total number of observations
total_observations_05 <- sum(true_positives_05 + true_negatives_05 + false_positives_05 + false_negatives_05)

# Calculate accuracy
accuracy_05 <- correct_predictions_05 / total_observations_05

# Calculate the precision
precision_05 <- true_positives_05 / (true_positives_05 + false_positives_05)

# Calculate the recall
recall_05 <- true_positives_05 / (true_positives_05 + false_negatives_05)

# Calculate the f1 score
f1_score_05 <- 2 * (precision_05 * recall_05) / (precision_05 + recall_05)

# Create vectors for performance metrics
performance_metrics_names <- c("Accuracy", "Precision", "Recall", "F1 Score")
performance_metrics_values <- c(accuracy_05, precision_05, recall_05, f1_score_05)

# Combine them into a data frame
performance_metrics_table <- data.frame("Performance Metric" = performance_metrics_names, Value = performance_metrics_values)

# Display the table
print(performance_metrics_table)

  Performance.Metric     Value
1           Accuracy 0.9354839
2          Precision 0.9230769
3             Recall 0.9230769
4           F1 Score 0.9230769

Calculations for 0.8 threshold

First we’ll start with calculating the True Positives, True Negatives, False Positives, and False Negatives for 0.8 threshold.

# Defined threshold
threshold <- 0.8

# Function to calculate True Positives
get_true_positives <- function(p, y, t) {
  preds <- ifelse(p >= t, 1, 0)
  sum(preds == 1 & y == 1)
}

# True Positives for specified threshold
true_positives_08 <- get_true_positives(df$.pred_female, ifelse(df$sex == "female", 1, 0), threshold)

# Function to calculate True Negatives
get_true_negatives <- function(p, y, t) {
  preds <- ifelse(p >= t, 1, 0)
  sum(preds == 0 & y == 0)
}

# True Negatives for specified threshold
true_negatives_08 <- get_true_negatives(df$.pred_female, ifelse(df$sex == "female", 1, 0), threshold)

# Function to calculate False Positives
get_false_positives <- function(p, y, t) {
  preds <- ifelse(p >= t, 1, 0)
  sum(preds == 1 & y == 0)
}

# False Positives for specified threshold
false_positives_08 <- get_false_positives(df$.pred_female, ifelse(df$sex == "female", 1, 0), threshold)

# Function to calculate False Negatives
get_false_negatives <- function(p, y, t) {
  preds <- ifelse(p >= t, 1, 0)
  sum(preds == 0 & y == 1)
}

# False Negatives for specified threshold
false_negatives_08 <- get_false_negatives(df$.pred_female, ifelse(df$sex == "female", 1, 0), threshold)

# Create vectors for items and values
item_names <- c("True Positives", "True Negatives", "False Positives", "False Negatives")
item_values <- c(true_positives_08, true_negatives_08, false_positives_08, false_negatives_08)

# Combine them into a data frame
item_table <- data.frame(Item = item_names, Value = item_values)

# Display the table
print(item_table)

             Item Value
1  True Positives    36
2  True Negatives    52
3 False Positives     2
4 False Negatives     3

Creating a Confusion Matrix for 0.8 threshold

actual_values <- ifelse(df$sex == "female", 1, 0)
predicted_probs <- df$.pred_female

# Defined threshold
threshold <- 0.8

# Convert probabilities to binary class predictions using the threshold
# Values >= threshold become 1, otherwise 0
predicted_values <- ifelse(predicted_probs >= threshold, 1, 0)

# Create the confusion table
confusion_matrix_08 <- table(predicted_values, actual_values)
print(confusion_matrix_08)

                actual_values
predicted_values  0  1
               0 52  3
               1  2 36

Calculating Performance Metrics for 0.8 threshold

Calculating the accuracy, precision, recall, f1 score for 0.8 threshold

correct_predictions_08 <- (true_positives_08 + true_negatives_08)

# Total number of observations
total_observations_08 <- sum(true_positives_08 + true_negatives_08 + false_positives_08 + false_negatives_08)

# Calculate accuracy
accuracy_08 <- correct_predictions_08 / total_observations_08

# Calculate the precision
precision_08 <- true_positives_08 / (true_positives_08 + false_positives_08)

# Calculate the recall
recall_08 <- true_positives_08 / (true_positives_08 + false_negatives_08)

# Calculate the f1 score
f1_score_08 <- 2 * (precision_08 * recall_08) / (precision_08 + recall_08)

# Create vectors for performance metrics
performance_metrics_names <- c("Accuracy", "Precision", "Recall", "F1 Score")
performance_metrics_values <- c(accuracy_08, precision_08, recall_08, f1_score_08)

# Combine them into a data frame
performance_metrics_table <- data.frame("Performance Metric" = performance_metrics_names, Value = performance_metrics_values)

# Display the table
print(performance_metrics_table)

  Performance.Metric     Value
1           Accuracy 0.9462366
2          Precision 0.9473684
3             Recall 0.9230769
4           F1 Score 0.9350649

Threshold Use Cases

One real world scenario where a 0.2 threshold is preferable is cancer screening because missing a positive case (false negative) is much more dangerous than flagging a negative case (false positive).

One real world scenario where a 0.8 threshold is preferable is credit card fraud detection because credit card companies want to reduce how often false positives happen. Otherwise customers may not be happy with the constant declines when they try to make a purchase that are slightly unusual.