In this homework assignment, you will work through various classification metrics. You will be asked to create functions in R to carry out the various calculations. You will also investigate some functions in packages that will let you obtain the equivalent results. Finally, you will create graphical output that also can be used to evaluate the output of classification models, such as binary logistic regression.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data <- read.csv('https://raw.githubusercontent.com/ddebonis47/classwork/refs/heads/main/classification-output-data.csv')
table(data$class, data$scored.class)
##
## 0 1
## 0 119 5
## 1 30 27
We can determine a few things looking at the confusion matrix of our data. The majority of the predictions made were correct: Of the 149 data points predicted to be in class zero, 119 indeed were, resulting that class. Of the 32 predicted to be in class one, 27 were. Of the total 181 participants, 35 were subjected to a type 1 or type 2 error.
The formula for accuracy is total correct predictions/total predictions made
get_accuracy <- function(data, actual_col = "class", predicted_col = "scored.class") {
actual <- data[[actual_col]]
predicted <- data[[predicted_col]]
mean(actual == predicted)
}
accuracy <- get_accuracy(data)
print(accuracy)
## [1] 0.8066298
The accuracy here is approximately 0.81.
Error rate is essentially the inverse of the accuracy function. It is the ratio of incorrect predictions over total predictions made.
df <- data
get_classification_error <- function(df, actual_col = "class", predicted_col = "scored.class") {
actual <- df[[actual_col]]
predicted <- df[[predicted_col]]
error_rate <- mean(actual != predicted)
return(error_rate)
}
error_rate <- get_classification_error(df)
print(error_rate)
## [1] 0.1933702
As expected, the produced value is equal to 1 - .8066, the accuracy.
What precision tells us is what percentage of those predicted to be in the positive class indeed were labeled as such.
get_precision <- function(df, actual_col = "class", predicted_col = "scored.class", positive_class = 1) {
actual <- df[[actual_col]]
predicted <- df[[predicted_col]]
# True Positives
TP <- sum(predicted == positive_class & actual == positive_class)
# False Positives
FP <- sum(predicted == positive_class & actual != positive_class)
# Avoid division by zero
precision <- ifelse((TP + FP) == 0, NA, TP / (TP + FP))
return(precision)
}
get_precision(df)
## [1] 0.84375
What sensitivity measures is the ratio of those predicted to be positive out of all those in the positive category. In other words, it measures what proportion of the actual positives were predicted to be as such.
get_sensitivity <- function(df, actual_col = "class", predicted_col = "scored.class", positive_class = 1) {
actual <- df[[actual_col]]
predicted <- df[[predicted_col]]
# True Positives (TP): actual = positive and predicted = positive
TP <- sum(actual == positive_class & predicted == positive_class)
# False Negatives (FN): actual = positive but predicted = negative
FN <- sum(actual == positive_class & predicted != positive_class)
# Avoid division by zero
sensitivity <- ifelse((TP + FN) == 0, NA, TP / (TP + FN))
return(sensitivity)
}
get_sensitivity(df)
## [1] 0.4736842
Or as a fraction, this would be 27/(27+30)
Specificity is like a counterpoint to sensitivity that looks at the true negatives that were predicted out of all that were actually negative. This measures what proportion of the actual negativces were predicted to be as such.
get_specificity <- function(df, actual_col = "class", predicted_col = "scored.class", positive_class = 1) {
actual <- df[[actual_col]]
predicted <- df[[predicted_col]]
# Identify negatives (anything not the positive_class)
negative_class <- setdiff(unique(actual), positive_class)
if (length(negative_class) == 0) {
stop("No negative class found. Check your data or positive_class setting.")
}
negative_class <- negative_class[1] # In case multiple
# True Negatives (TN): actual = negative and predicted = negative
TN <- sum(actual == negative_class & predicted == negative_class)
# False Positives (FP): actual = negative but predicted = positive
FP <- sum(actual == negative_class & predicted == positive_class)
# Avoid division by zero
specificity <- ifelse((TN + FP) == 0, NA, TN / (TN + FP))
return(specificity)
}
get_specificity(df)
## [1] 0.9596774
Or in fraction form, it would be equal to 119/124
The F1 score is the harmonic mean of precision and recall, a useful metric when both are important.
get_f1 <- function(df, actual_col = "class", predicted_col = "scored.class", positive_class = 1) {
actual <- df[[actual_col]]
predicted <- df[[predicted_col]]
# True Positives (TP)
TP <- sum(predicted == positive_class & actual == positive_class)
# False Positives (FP)
FP <- sum(predicted == positive_class & actual != positive_class)
# False Negatives (FN)
FN <- sum(predicted != positive_class & actual == positive_class)
# Precision and Recall
precision <- ifelse((TP + FP) == 0, NA, TP / (TP + FP))
recall <- ifelse((TP + FN) == 0, NA, TP / (TP + FN))
# F1 Score
if (is.na(precision) || is.na(recall) || (precision + recall) == 0) {
f1 <- NA
} else {
f1 <- 2 * (precision * recall) / (precision + recall)
}
return(f1)
}
get_f1(df)
## [1] 0.6067416
The F1 score is a metric that exists between the values of 0 and 1. If all elements are predicted correctly, the F1 score is one. More specifically, F1 score is based on the precision and recall, which are both proportions between zero and one given the formulas include some pieces in both the numerator and denominator, with more in the denominator. The harmonic mean of two values between zero and one will also be between zero and one.
get_roc <- function(df, actual_col = "class", prob_col = "scored.probability", positive_class = 1) {
actual <- df[[actual_col]]
prob <- df[[prob_col]]
thresholds <- seq(0, 1, by = 0.01)
tpr <- numeric(length(thresholds)) # True Positive Rate (Sensitivity)
fpr <- numeric(length(thresholds)) # False Positive Rate (1 - Specificity)
for (i in seq_along(thresholds)) {
t <- thresholds[i]
predicted <- ifelse(prob >= t, positive_class, 1 - positive_class)
TP <- sum(predicted == positive_class & actual == positive_class)
FP <- sum(predicted == positive_class & actual != positive_class)
TN <- sum(predicted != positive_class & actual != positive_class)
FN <- sum(predicted != positive_class & actual == positive_class)
tpr[i] <- ifelse((TP + FN) == 0, 0, TP / (TP + FN))
fpr[i] <- ifelse((FP + TN) == 0, 0, FP / (FP + TN))
}
# Sort points by FPR for plotting and AUC calculation
order_idx <- order(fpr)
fpr <- fpr[order_idx]
tpr <- tpr[order_idx]
# Compute AUC via trapezoidal rule
auc <- sum(diff(fpr) * (head(tpr, -1) + tail(tpr, -1)) / 2)
# Plot ROC
library(ggplot2)
roc_plot <- ggplot(data.frame(FPR = fpr, TPR = tpr), aes(x = FPR, y = TPR)) +
geom_line(color = "blue", linewidth = 1) +
geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "red") +
labs(title = paste("ROC Curve (AUC =", round(auc, 3), ")"),
x = "False Positive Rate (1 - Specificity)",
y = "True Positive Rate (Sensitivity)") +
theme_minimal()
return(list(plot = roc_plot, auc = auc))
}
roc_result <- get_roc(df)
roc_result$plot # shows the ROC curve
roc_result$auc
## [1] 0.8484012
I have been generating these values as I have been working through the assignment, but I will replicate the findings into a table here:
# Assuming your dataset is named df
metrics <- list(
accuracy = get_accuracy(df),
error_rate = get_classification_error(df),
precision = get_precision(df),
sensitivity = get_sensitivity(df),
specificity = get_specificity(df),
f1_score = get_f1(df),
roc = get_roc(df)
)
# View the numeric metrics
metrics[c("accuracy", "error_rate", "precision", "sensitivity", "specificity", "f1_score")]
## $accuracy
## [1] 0.8066298
##
## $error_rate
## [1] 0.1933702
##
## $precision
## [1] 0.84375
##
## $sensitivity
## [1] 0.4736842
##
## $specificity
## [1] 0.9596774
##
## $f1_score
## [1] 0.6067416
# View the ROC curve
metrics$roc$plot
# View the AUC
metrics$roc$auc
## [1] 0.8484012
library(caret)
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
Under the caret package, it is important for variables to be factors
# Make sure the columns are factors
df$class <- factor(df$class, levels = c(0,1))
df$scored.class <- factor(df$scored.class, levels = c(0,1))
# Compute confusion matrix
cm <- confusionMatrix(data = df$scored.class, reference = df$class, positive = "1")
cm
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 119 30
## 1 5 27
##
## Accuracy : 0.8066
## 95% CI : (0.7415, 0.8615)
## No Information Rate : 0.6851
## P-Value [Acc > NIR] : 0.0001712
##
## Kappa : 0.4916
##
## Mcnemar's Test P-Value : 4.976e-05
##
## Sensitivity : 0.4737
## Specificity : 0.9597
## Pos Pred Value : 0.8438
## Neg Pred Value : 0.7987
## Prevalence : 0.3149
## Detection Rate : 0.1492
## Detection Prevalence : 0.1768
## Balanced Accuracy : 0.7167
##
## 'Positive' Class : 1
##
Looks like the values that I had calculated already have the same values here. There are other statistics present in this summary, but the common values match exactly.
Step 13 - Replicating with pROC package
library(pROC)
## Type 'citation("pROC")' for a citation.
##
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
# Ensure class is numeric or factor
df$class <- as.numeric(df$class)
# Generate ROC object
roc_obj <- roc(response = df$class, predictor = df$scored.probability)
## Setting levels: control = 1, case = 2
## Setting direction: controls < cases
# Plot ROC curve
plot(roc_obj, col = "blue", lwd = 2, main = "ROC Curve (pROC)")
abline(a = 0, b = 1, lty = 2, col = "red") # random guess line
# Compute AUC
auc_value <- auc(roc_obj)
auc_value
## Area under the curve: 0.8503
Here we have some difference between this graph and the previous one. One interesting distinction is that the graph generated with the pROC package does not bound the axes between 0 and 1. This makes the graphs look slightly different but the same information is being presented on slightly different scales. Additionally, the area under the curve calculated in the two methods differs very slightly: .848 vs. .850.