This assignment focuses on evaluating the performance of a binary classification model. Using predicted probabilities and known class labels, the goal is to understand how different decision thresholds influence model errors and performance metrics.
Classification metrics
Approach
To evaluate the performance of a binary classification model, I will work with a provided dataset containing model-predicted probabilities and true class labels. My step-by-step plan involves first examining the distribution of the actual class labels to establish a baseline using the null error rate. I will then assess how converting predicted probabilities into class labels at different decision thresholds affects classification outcomes. Using these results, I will construct confusion matrices and derive performance metrics to understand how threshold choice influences the balance between different types of classification errors.
Base Code
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.1 ✔ tibble 3.3.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 93 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): .pred_class, sex
dbl (1): .pred_female
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
This analysis demonstrates that classification model performance depends strongly on the chosen decision threshold. By examining the null error rate, confusion matrices, and performance metrics, it becomes clear that accuracy alone is not sufficient to evaluate a model. Lower thresholds increase recall by identifying more positive cases, while higher thresholds improve precision by reducing false positives. As shown in this assignment, there is no single “best” threshold; instead, the appropriate threshold should be selected based on the specific costs and priorities of different types of classification errors.