This report analyzes the Leave-One-Out Cross-Validation (LOOCV) results of a 3-class classification model. The primary objective is to move beyond the default prediction (simply choosing the class with the highest probability) and determine optimal probability thresholds for classifying subjects into Benign, Cancer, and Normal categories.
We will evaluate three distinct scenarios to understand the trade-offs between them:
The analysis will conclude with a visual representation of the model’s discriminative ability via ROC curves and provide a final recommendation.
First, we load the R packages required for our analysis.
# pROC: For ROC curve analysis and calculating optimal thresholds.
library(pROC)
# caret: For creating confusion matrices and calculating performance metrics.
library(caret)
# ggplot2: For creating high-quality, customizable visualizations.
library(ggplot2)
# tidyr & dplyr: For data manipulation and preparation.
library(tidyr)
library(dplyr)
# knitr: For creating formatted tables.
library(knitr)
# kableExtra: For advanced table styling.
library(kableExtra)
Next, we load the LOOCV results from the CSV file.
loocv_data <- read.csv("LOOCV_Data - LOOCV probabilities.csv")
class_names <- c("Benign", "Cancer", "Normal")
prob_cols <- c("Prob_Benign", "Prob_Cancer", "Prob_Normal")
loocv_data$True_Label <- factor(loocv_data$True_Label, levels = class_names)
To keep our analysis clean and reproducible, we define two helper functions.
This function calculates and displays performance metrics as
beautifully formatted tables using kableExtra.
calculate_global_performance <- function(cm, scenario_name) {
cat("\n### Performance Metrics for:", scenario_name, "\n")
# Overall Statistics
cat("\n**Overall Statistics:**\n")
kable(t(as.data.frame(cm$overall)), caption = "Overall Model Performance", booktabs = TRUE, digits = 4) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE, position = "left") %>%
print()
# Class-Specific Statistics
cat("\n**Class-Specific Statistics:**\n")
kable(cm$byClass, caption = "Performance by Class", booktabs = TRUE, digits = 4) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE, position = "left") %>%
print()
}
This function visualizes a confusion matrix as a heatmap.
plot_confusion_matrix <- function(cm, plot_title) {
# Convert the confusion matrix to a tidy data frame
cm_table <- as.data.frame(cm$table)
plot <- ggplot(cm_table, aes(x = Reference, y = Prediction, fill = Freq)) +
geom_tile(color = "black") +
geom_text(aes(label = Freq), vjust = 1, size = 6) +
scale_fill_gradient(low = "white", high = "#009194") +
labs(title = plot_title, x = "True Class", y = "Predicted Class") +
theme_minimal(base_size = 15) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
legend.position = "none"
)
print(plot)
}
This is the standard approach and serves as our benchmark.
baseline_predictions <- factor(loocv_data$Pred_Label, levels = class_names)
cm_baseline <- confusionMatrix(baseline_predictions, loocv_data$True_Label)
calculate_global_performance(cm_baseline, "Baseline (Highest Probability)")
| Accuracy | Kappa | AccuracyLower | AccuracyUpper | AccuracyNull | AccuracyPValue | McnemarPValue | |
|---|---|---|---|---|---|---|---|
| cm$overall | 0.8582 | 0.7792 | 0.8253 | 0.887 | 0.4368 | 0 | NaN |
| Sensitivity | Specificity | Pos Pred Value | Neg Pred Value | Precision | Recall | F1 | Prevalence | Detection Rate | Detection Prevalence | Balanced Accuracy | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Class: Benign | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2874 | 0.2874 | 0.2874 | 1.0000 |
| Class: Cancer | 0.8991 | 0.8265 | 0.8008 | 0.9135 | 0.8008 | 0.8991 | 0.8471 | 0.4368 | 0.3927 | 0.4904 | 0.8628 |
| Class: Normal | 0.6458 | 0.9392 | 0.8017 | 0.8744 | 0.8017 | 0.6458 | 0.7154 | 0.2759 | 0.1782 | 0.2222 | 0.7925 |
plot_confusion_matrix(cm_baseline, "Confusion Matrix: Baseline Scenario")
Analysis of Baseline: The overall accuracy is high. However, the heatmap shows some misclassifications, particularly between ‘Cancer’ and ‘Normal’. Our goal is to see if we can improve this by adjusting thresholds.
This data-driven approach finds thresholds that optimally balance sensitivity and specificity for each class.
roc_curves <- list()
optimal_thresholds <- c()
cat("--- Calculating Optimal Thresholds (Youden's J-Index) ---\n")
— Calculating Optimal Thresholds (Youden’s J-Index) —
for(i in 1:length(class_names)) {
class <- class_names[i]
response <- ifelse(loocv_data$True_Label == class, 1, 0)
roc_obj <- roc(response, loocv_data[[prob_cols[i]]], quiet = TRUE)
roc_curves[[class]] <- roc_obj
best_thresh <- coords(roc_obj, "best", ret = "threshold", best.method="youden")$threshold
optimal_thresholds[class] <- best_thresh[1]
cat(paste("Optimal threshold for", class, "is:", round(optimal_thresholds[class], 4)), "\n\n")
}
Optimal threshold for Benign is: 0.53
Optimal threshold for Cancer is: 0.5767
Optimal threshold for Normal is: 0.32
predictions_roc <- apply(loocv_data[, prob_cols], 1, function(probs) {
passed_threshold <- probs >= optimal_thresholds
if (sum(passed_threshold) == 1) return(names(optimal_thresholds)[which(passed_threshold)])
if (sum(passed_threshold) > 1) return(names(probs[passed_threshold])[which.max(probs[passed_threshold])])
return(class_names[which.max(probs)])
})
predictions_roc <- factor(predictions_roc, levels = class_names)
cm_roc <- confusionMatrix(predictions_roc, loocv_data$True_Label)
calculate_global_performance(cm_roc, "Optimal ROC Thresholds (Youden's J)")
| Accuracy | Kappa | AccuracyLower | AccuracyUpper | AccuracyNull | AccuracyPValue | McnemarPValue | |
|---|---|---|---|---|---|---|---|
| cm$overall | 0.9011 | 0.8514 | 0.8703 | 0.9267 | 0.4043 | 0 | NaN |
| Sensitivity | Specificity | Pos Pred Value | Neg Pred Value | Precision | Recall | F1 | Prevalence | Detection Rate | Detection Prevalence | Balanced Accuracy | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Class: Benign | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.3226 | 0.3226 | 0.3226 | 1.0000 |
| Class: Cancer | 0.8032 | 0.9675 | 0.9438 | 0.8787 | 0.9438 | 0.8032 | 0.8678 | 0.4043 | 0.3247 | 0.3441 | 0.8854 |
| Class: Normal | 0.9291 | 0.8905 | 0.7613 | 0.9710 | 0.7613 | 0.9291 | 0.8369 | 0.2731 | 0.2538 | 0.3333 | 0.9098 |
plot_confusion_matrix(cm_roc, "Confusion Matrix: Balanced (Youden's J) Scenario")
Analysis of Scenario 2: This balanced approach has improved the model’s performance, increasing overall accuracy. The confusion matrix heatmap shows fewer errors, indicating a better balance in correctly identifying all classes.
This scenario prioritizes minimizing false negatives for the ‘Cancer’ class, a critical requirement in a clinical screening context.
high_sens_thresholds <- optimal_thresholds
cancer_roc <- roc_curves[['Cancer']]
sens_coords <- coords(cancer_roc, x = "all", ret = c("threshold", "sensitivity"))
high_sens_thresh_val <- min(sens_coords$threshold[sens_coords$sensitivity >= 0.95])
high_sens_thresholds['Cancer'] <- high_sens_thresh_val
cat(paste("\n--- High Sensitivity Scenario ---\nNew threshold for Cancer (>=95% sensitivity) is:", round(high_sens_thresholds['Cancer'], 4), "\n"))
— High Sensitivity Scenario — New threshold for Cancer (>=95% sensitivity) is: -Inf
predictions_high_sens <- apply(loocv_data[, prob_cols], 1, function(probs) {
passed_threshold <- probs >= high_sens_thresholds
if (sum(passed_threshold) == 1) return(names(high_sens_thresholds)[which(passed_threshold)])
if (sum(passed_threshold) > 1) return(names(probs[passed_threshold])[which.max(probs[passed_threshold])])
return(class_names[which.max(probs)])
})
predictions_high_sens <- factor(predictions_high_sens, levels = class_names)
cm_high_sens <- confusionMatrix(predictions_high_sens, loocv_data$True_Label)
calculate_global_performance(cm_high_sens, "High Sensitivity for 'Cancer'")
| Accuracy | Kappa | AccuracyLower | AccuracyUpper | AccuracyNull | AccuracyPValue | McnemarPValue | |
|---|---|---|---|---|---|---|---|
| cm$overall | 0.9438 | 0 | 0.8959 | 0.974 | 0.9438 | 0.5874 | NaN |
| Sensitivity | Specificity | Pos Pred Value | Neg Pred Value | Precision | Recall | F1 | Prevalence | Detection Rate | Detection Prevalence | Balanced Accuracy | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Class: Benign | NA | 1 | NA | NA | NA | NA | NA | 0.0000 | 0.0000 | 0 | NA |
| Class: Cancer | 1 | 0 | 0.9438 | NaN | 0.9438 | 1 | 0.9711 | 0.9438 | 0.9438 | 1 | 0.5 |
| Class: Normal | 0 | 1 | NaN | 0.9438 | NA | 0 | NA | 0.0562 | 0.0000 | 0 | 0.5 |
plot_confusion_matrix(cm_high_sens, "Confusion Matrix: High 'Cancer' Sensitivity Scenario")
Analysis of Scenario 3: As intended, the sensitivity for ‘Cancer’ is now very high. However, the heatmap shows that this comes at the cost of misclassifying more ‘Normal’ and ‘Benign’ cases as ‘Cancer’. This trade-off is often acceptable for a first-line screening tool.
These plots show the distribution of predicted probabilities for each class, separated by the true label. They help us understand the model’s certainty and where misclassifications occur due to overlapping distributions. The vertical lines represent the optimal thresholds calculated in Scenario 2.
# Prepare data for plotting
plot_data_long <- loocv_data %>%
select(True_Label, all_of(prob_cols)) %>%
pivot_longer(
cols = all_of(prob_cols),
names_to = "Predicted_Class_Prob",
values_to = "Probability"
) %>%
mutate(Predicted_Class_Prob = gsub("Prob_", "", Predicted_Class_Prob))
# Create threshold data for vlines
threshold_df <- data.frame(
Predicted_Class_Prob = names(optimal_thresholds),
thresh = optimal_thresholds
)
# Plot
ggplot(plot_data_long, aes(x = Probability, fill = True_Label)) +
geom_density(alpha = 0.6) +
facet_wrap(~ Predicted_Class_Prob, scales = "free_y") +
geom_vline(data = threshold_df, aes(xintercept = thresh), linetype = "dashed", color = "black", size = 1) +
labs(
title = "Probability Distributions by True Class",
x = "Predicted Probability",
y = "Density",
fill = "True Class"
) +
theme_bw(base_size = 14) +
theme(plot.title = element_text(hjust = 0.5, face = "bold"), legend.position = "bottom") +
scale_fill_brewer(palette = "Set1")
Analysis of Distribution Plots: The plots show good separation for ‘Benign’ probabilities. However, there is a noticeable overlap in the distributions for ‘Cancer’ and ‘Normal’ probabilities, which explains why the model sometimes confuses these two classes. The dashed lines show the balanced thresholds, positioned to effectively separate the dominant peak for each class.
The ROC curve visualizes the trade-off between sensitivity and specificity and the AUC provides a single metric of the model’s discriminative ability for each class in a one-vs-all context.
roc_data_list <- lapply(names(roc_curves), function(class_name) {
roc_obj <- roc_curves[[class_name]]
data.frame(
Specificity = 1 - roc_obj$specificities,
Sensitivity = roc_obj$sensitivities,
Class = class_name,
AUC = round(auc(roc_obj), 3)
)
})
roc_data_combined <- do.call(rbind, roc_data_list)
roc_data_combined$Label <- paste0(roc_data_combined$Class, " (AUC = ", roc_data_combined$AUC, ")")
roc_plot <- ggplot(roc_data_combined, aes(x = Specificity, y = Sensitivity, color = Label)) +
geom_line(size = 1.2) +
geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray50") +
labs(
title = "Multi-Class ROC Curves (One-vs-All)",
x = "1 - Specificity (False Positive Rate)",
y = "Sensitivity (True Positive Rate)",
color = "Class Performance"
) +
theme_minimal(base_size = 15) +
theme(legend.position = "bottom", plot.title = element_text(hjust = 0.5, face = "bold")) +
scale_color_brewer(palette = "Set1")
print(roc_plot)
Analysis of ROC Plot: The high AUC values for all classes (especially Benign and Cancer) confirm the model has strong discriminative power.
This analysis demonstrates that by tuning probability thresholds, we can tailor the model’s behavior to meet specific objectives.
The final choice of thresholds depends entirely on the end-use case of the model. This analysis provides the data to make an informed, evidence-based decision.