1. Introduction and Objective

This report analyzes the Leave-One-Out Cross-Validation (LOOCV) results of a 3-class classification model. The primary objective is to move beyond the default prediction (simply choosing the class with the highest probability) and determine optimal probability thresholds for classifying subjects into Benign, Cancer, and Normal categories.

We will evaluate three distinct scenarios to understand the trade-offs between them:

  1. Baseline Performance: Using the highest probability to make a prediction.
  2. Balanced Performance: Using ROC analysis (Youden’s J-Index) to find thresholds that balance sensitivity and specificity for each class.
  3. High-Sensitivity for Cancer: Prioritizing the detection of ‘Cancer’ cases, which is crucial in a clinical setting to minimize false negatives.

The analysis will conclude with a visual representation of the model’s discriminative ability via ROC curves and provide a final recommendation.

2. Setup and Data Loading

2.1. Load Necessary Libraries

First, we load the R packages required for our analysis.

# pROC: For ROC curve analysis and calculating optimal thresholds.
library(pROC)
# caret: For creating confusion matrices and calculating performance metrics.
library(caret)
# ggplot2: For creating high-quality, customizable visualizations.
library(ggplot2)
# tidyr & dplyr: For data manipulation and preparation.
library(tidyr)
library(dplyr)
# knitr: For creating formatted tables.
library(knitr)
# kableExtra: For advanced table styling.
library(kableExtra)

2.2. Load and Prepare Data

Next, we load the LOOCV results from the CSV file.

loocv_data <- read.csv("LOOCV_Data - LOOCV probabilities.csv")
class_names <- c("Benign", "Cancer", "Normal")
prob_cols <- c("Prob_Benign", "Prob_Cancer", "Prob_Normal")
loocv_data$True_Label <- factor(loocv_data$True_Label, levels = class_names)

3. Helper Functions for Evaluation and Visualization

To keep our analysis clean and reproducible, we define two helper functions.

3.1. Performance Evaluation Function

This function calculates and displays performance metrics as beautifully formatted tables using kableExtra.

calculate_global_performance <- function(cm, scenario_name) {
  cat("\n### Performance Metrics for:", scenario_name, "\n")
  
  # Overall Statistics
  cat("\n**Overall Statistics:**\n")
  kable(t(as.data.frame(cm$overall)), caption = "Overall Model Performance", booktabs = TRUE, digits = 4) %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE, position = "left") %>%
    print()
  
  # Class-Specific Statistics
  cat("\n**Class-Specific Statistics:**\n")
  kable(cm$byClass, caption = "Performance by Class", booktabs = TRUE, digits = 4) %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE, position = "left") %>%
    print()
}

3.2. Confusion Matrix Plotting Function

This function visualizes a confusion matrix as a heatmap.

plot_confusion_matrix <- function(cm, plot_title) {
  # Convert the confusion matrix to a tidy data frame
  cm_table <- as.data.frame(cm$table)
  
  plot <- ggplot(cm_table, aes(x = Reference, y = Prediction, fill = Freq)) +
    geom_tile(color = "black") +
    geom_text(aes(label = Freq), vjust = 1, size = 6) +
    scale_fill_gradient(low = "white", high = "#009194") +
    labs(title = plot_title, x = "True Class", y = "Predicted Class") +
    theme_minimal(base_size = 15) +
    theme(
      plot.title = element_text(hjust = 0.5, face = "bold"),
      legend.position = "none"
    )
  
  print(plot)
}

4. Scenario Analysis: Finding Optimal Thresholds


4.1. Scenario 1: Baseline Performance (Highest Probability Wins)

This is the standard approach and serves as our benchmark.

baseline_predictions <- factor(loocv_data$Pred_Label, levels = class_names)
cm_baseline <- confusionMatrix(baseline_predictions, loocv_data$True_Label)

calculate_global_performance(cm_baseline, "Baseline (Highest Probability)")

Performance Metrics for: Baseline (Highest Probability)

Overall Statistics:
Overall Model Performance
Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull AccuracyPValue McnemarPValue
cm$overall 0.8582 0.7792 0.8253 0.887 0.4368 0 NaN
Class-Specific Statistics:
Performance by Class
Sensitivity Specificity Pos Pred Value Neg Pred Value Precision Recall F1 Prevalence Detection Rate Detection Prevalence Balanced Accuracy
Class: Benign 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.2874 0.2874 0.2874 1.0000
Class: Cancer 0.8991 0.8265 0.8008 0.9135 0.8008 0.8991 0.8471 0.4368 0.3927 0.4904 0.8628
Class: Normal 0.6458 0.9392 0.8017 0.8744 0.8017 0.6458 0.7154 0.2759 0.1782 0.2222 0.7925
plot_confusion_matrix(cm_baseline, "Confusion Matrix: Baseline Scenario")

Analysis of Baseline: The overall accuracy is high. However, the heatmap shows some misclassifications, particularly between ‘Cancer’ and ‘Normal’. Our goal is to see if we can improve this by adjusting thresholds.


4.2. Scenario 2: Optimal Thresholds via ROC (Maximizing Youden’s J-Index)

This data-driven approach finds thresholds that optimally balance sensitivity and specificity for each class.

roc_curves <- list()
optimal_thresholds <- c()
cat("--- Calculating Optimal Thresholds (Youden's J-Index) ---\n")

— Calculating Optimal Thresholds (Youden’s J-Index) —

for(i in 1:length(class_names)) {
  class <- class_names[i]
  response <- ifelse(loocv_data$True_Label == class, 1, 0)
  roc_obj <- roc(response, loocv_data[[prob_cols[i]]], quiet = TRUE)
  roc_curves[[class]] <- roc_obj
  best_thresh <- coords(roc_obj, "best", ret = "threshold", best.method="youden")$threshold
  optimal_thresholds[class] <- best_thresh[1]
  
  cat(paste("Optimal threshold for", class, "is:", round(optimal_thresholds[class], 4)), "\n\n")
}

Optimal threshold for Benign is: 0.53

Optimal threshold for Cancer is: 0.5767

Optimal threshold for Normal is: 0.32

predictions_roc <- apply(loocv_data[, prob_cols], 1, function(probs) {
  passed_threshold <- probs >= optimal_thresholds
  if (sum(passed_threshold) == 1) return(names(optimal_thresholds)[which(passed_threshold)])
  if (sum(passed_threshold) > 1) return(names(probs[passed_threshold])[which.max(probs[passed_threshold])])
  return(class_names[which.max(probs)])
})

predictions_roc <- factor(predictions_roc, levels = class_names)
cm_roc <- confusionMatrix(predictions_roc, loocv_data$True_Label)

calculate_global_performance(cm_roc, "Optimal ROC Thresholds (Youden's J)")

Performance Metrics for: Optimal ROC Thresholds (Youden’s J)

Overall Statistics:
Overall Model Performance
Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull AccuracyPValue McnemarPValue
cm$overall 0.9011 0.8514 0.8703 0.9267 0.4043 0 NaN
Class-Specific Statistics:
Performance by Class
Sensitivity Specificity Pos Pred Value Neg Pred Value Precision Recall F1 Prevalence Detection Rate Detection Prevalence Balanced Accuracy
Class: Benign 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.3226 0.3226 0.3226 1.0000
Class: Cancer 0.8032 0.9675 0.9438 0.8787 0.9438 0.8032 0.8678 0.4043 0.3247 0.3441 0.8854
Class: Normal 0.9291 0.8905 0.7613 0.9710 0.7613 0.9291 0.8369 0.2731 0.2538 0.3333 0.9098
plot_confusion_matrix(cm_roc, "Confusion Matrix: Balanced (Youden's J) Scenario")

Analysis of Scenario 2: This balanced approach has improved the model’s performance, increasing overall accuracy. The confusion matrix heatmap shows fewer errors, indicating a better balance in correctly identifying all classes.


4.3. Scenario 3: High Sensitivity for ‘Cancer’

This scenario prioritizes minimizing false negatives for the ‘Cancer’ class, a critical requirement in a clinical screening context.

high_sens_thresholds <- optimal_thresholds 
cancer_roc <- roc_curves[['Cancer']]
sens_coords <- coords(cancer_roc, x = "all", ret = c("threshold", "sensitivity"))
high_sens_thresh_val <- min(sens_coords$threshold[sens_coords$sensitivity >= 0.95])
high_sens_thresholds['Cancer'] <- high_sens_thresh_val

cat(paste("\n--- High Sensitivity Scenario ---\nNew threshold for Cancer (>=95% sensitivity) is:", round(high_sens_thresholds['Cancer'], 4), "\n"))

— High Sensitivity Scenario — New threshold for Cancer (>=95% sensitivity) is: -Inf

predictions_high_sens <- apply(loocv_data[, prob_cols], 1, function(probs) {
  passed_threshold <- probs >= high_sens_thresholds
  if (sum(passed_threshold) == 1) return(names(high_sens_thresholds)[which(passed_threshold)])
  if (sum(passed_threshold) > 1) return(names(probs[passed_threshold])[which.max(probs[passed_threshold])])
  return(class_names[which.max(probs)])
})

predictions_high_sens <- factor(predictions_high_sens, levels = class_names)
cm_high_sens <- confusionMatrix(predictions_high_sens, loocv_data$True_Label)

calculate_global_performance(cm_high_sens, "High Sensitivity for 'Cancer'")

Performance Metrics for: High Sensitivity for ‘Cancer’

Overall Statistics:
Overall Model Performance
Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull AccuracyPValue McnemarPValue
cm$overall 0.9438 0 0.8959 0.974 0.9438 0.5874 NaN
Class-Specific Statistics:
Performance by Class
Sensitivity Specificity Pos Pred Value Neg Pred Value Precision Recall F1 Prevalence Detection Rate Detection Prevalence Balanced Accuracy
Class: Benign NA 1 NA NA NA NA NA 0.0000 0.0000 0 NA
Class: Cancer 1 0 0.9438 NaN 0.9438 1 0.9711 0.9438 0.9438 1 0.5
Class: Normal 0 1 NaN 0.9438 NA 0 NA 0.0562 0.0000 0 0.5
plot_confusion_matrix(cm_high_sens, "Confusion Matrix: High 'Cancer' Sensitivity Scenario")

Analysis of Scenario 3: As intended, the sensitivity for ‘Cancer’ is now very high. However, the heatmap shows that this comes at the cost of misclassifying more ‘Normal’ and ‘Benign’ cases as ‘Cancer’. This trade-off is often acceptable for a first-line screening tool.

5. Visualizing Probability Distributions

These plots show the distribution of predicted probabilities for each class, separated by the true label. They help us understand the model’s certainty and where misclassifications occur due to overlapping distributions. The vertical lines represent the optimal thresholds calculated in Scenario 2.

# Prepare data for plotting
plot_data_long <- loocv_data %>%
  select(True_Label, all_of(prob_cols)) %>%
  pivot_longer(
    cols = all_of(prob_cols),
    names_to = "Predicted_Class_Prob",
    values_to = "Probability"
  ) %>%
  mutate(Predicted_Class_Prob = gsub("Prob_", "", Predicted_Class_Prob))

# Create threshold data for vlines
threshold_df <- data.frame(
  Predicted_Class_Prob = names(optimal_thresholds),
  thresh = optimal_thresholds
)

# Plot
ggplot(plot_data_long, aes(x = Probability, fill = True_Label)) +
  geom_density(alpha = 0.6) +
  facet_wrap(~ Predicted_Class_Prob, scales = "free_y") +
  geom_vline(data = threshold_df, aes(xintercept = thresh), linetype = "dashed", color = "black", size = 1) +
  labs(
    title = "Probability Distributions by True Class",
    x = "Predicted Probability",
    y = "Density",
    fill = "True Class"
  ) +
  theme_bw(base_size = 14) +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"), legend.position = "bottom") +
  scale_fill_brewer(palette = "Set1")

Analysis of Distribution Plots: The plots show good separation for ‘Benign’ probabilities. However, there is a noticeable overlap in the distributions for ‘Cancer’ and ‘Normal’ probabilities, which explains why the model sometimes confuses these two classes. The dashed lines show the balanced thresholds, positioned to effectively separate the dominant peak for each class.

6. Model Discriminative Power: ROC Curves

The ROC curve visualizes the trade-off between sensitivity and specificity and the AUC provides a single metric of the model’s discriminative ability for each class in a one-vs-all context.

roc_data_list <- lapply(names(roc_curves), function(class_name) {
  roc_obj <- roc_curves[[class_name]]
  data.frame(
    Specificity = 1 - roc_obj$specificities,
    Sensitivity = roc_obj$sensitivities,
    Class = class_name,
    AUC = round(auc(roc_obj), 3)
  )
})
roc_data_combined <- do.call(rbind, roc_data_list)
roc_data_combined$Label <- paste0(roc_data_combined$Class, " (AUC = ", roc_data_combined$AUC, ")")

roc_plot <- ggplot(roc_data_combined, aes(x = Specificity, y = Sensitivity, color = Label)) +
  geom_line(size = 1.2) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray50") +
  labs(
    title = "Multi-Class ROC Curves (One-vs-All)",
    x = "1 - Specificity (False Positive Rate)",
    y = "Sensitivity (True Positive Rate)",
    color = "Class Performance"
  ) +
  theme_minimal(base_size = 15) +
  theme(legend.position = "bottom", plot.title = element_text(hjust = 0.5, face = "bold")) +
  scale_color_brewer(palette = "Set1")

print(roc_plot)

Analysis of ROC Plot: The high AUC values for all classes (especially Benign and Cancer) confirm the model has strong discriminative power.

7. Conclusion and Recommendations

This analysis demonstrates that by tuning probability thresholds, we can tailor the model’s behavior to meet specific objectives.

  • Scenario 1 (Baseline): Provides a good starting point but is outperformed by more nuanced approaches.
  • Scenario 2 (Youden’s J-Index): Delivers the best-balanced performance, achieving the highest overall accuracy. This is the recommended approach for general use where no single class is overwhelmingly more critical than the others.
  • Scenario 3 (High Sensitivity for ‘Cancer’): This is the clear choice for a clinical screening setting. Its primary goal is to minimize the risk of missing a potential ‘Cancer’ case. The trade-off is an increase in false alarms, which is generally considered acceptable in this context.

The final choice of thresholds depends entirely on the end-use case of the model. This analysis provides the data to make an informed, evidence-based decision.