Introduction

As a researcher that works in the field of radiology, Sensitivity and Specificity are two important values that are used to evaluate diagnostic imaging modalities. After all, it is important to know whether or not a specific type of scan (MRI, CT, Ultra Sound) is appropriate for detecting the condition in question.

In this tutorial, we’ll go through the steps to calculate sensitivity and specificity in R to evaluate the performance of a diagnostic model. For our example, we’ll use the Breast Cancer Wisconsin (Diagnostic) dataset, which is available in the UCI Machine Learning Repository. This dataset is widely used for demonstrating machine learning techniques in medical diagnostics, particularly for predicting whether a breast mass is malignant or benign based on features derived from digitized images.

Importing the Data and Building the Model

library(caTools)
## Warning: package 'caTools' was built under R version 4.3.3
library(ROCR)
## Warning: package 'ROCR' was built under R version 4.3.3
library(caret)
## Warning: package 'caret' was built under R version 4.3.2
## Loading required package: ggplot2
## Loading required package: lattice
library(mlbench)
#Importing and cleaning the data

data(BreastCancer)
bc_data = BreastCancer

bc_data = bc_data[, -which(names(bc_data) %in% c("Id", "X"))]
bc_data$Class = ifelse(bc_data$Class == "malignant", 1, 0)
sum(is.na(bc_data))
## [1] 16
#Splitting the Data

set.seed(621)  

impute_preprocess = preProcess(bc_data, method = 'medianImpute')

bc_data = predict(impute_preprocess, bc_data)

sum(is.na(bc_data))
## [1] 16
split = sample.split(bc_data$Class, SplitRatio = 0.7)
train_data = subset(bc_data, split == TRUE)
test_data = subset(bc_data, split == FALSE)

sum(is.na(test_data$predicted_prob))
## [1] 0
#building the model
model = glm(Class ~ ., data = train_data, family = "binomial")
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
test_data$predicted_prob = predict(model, newdata = test_data, type = "response")
test_data$predicted_class = ifelse(test_data$predicted_prob >= 0.5, 1, 0)

Calculating Sensitivity and Specificity

To calculate the Sensitivity and Specificity, you can use the ROCR package. Note, I had to clean the data a bit as the test_data had a few NAs that was throwing off the whole process. This was unexpected as I thought I removed all missing values earlier.

clean_test_data = test_data[!is.na(test_data$predicted_prob), ]

pred = prediction(predictions = clean_test_data$predicted_prob, labels = clean_test_data$Class)
perf = performance(pred, "sens", "spec")
sensitivity = perf@y.values[[1]]
specificity = perf@x.values[[1]]


cat("Sensitivity:", sensitivity, "\n")
## Sensitivity: 0 0.75 0.7638889 0.7777778 0.7916667 0.8055556 0.8194444 0.8333333 0.8472222 0.8611111 0.875 0.875 0.875 0.8888889 0.9027778 0.9166667 0.9166667 0.9305556 0.9305556 0.9305556 0.9305556 0.9305556 0.9305556 0.9305556 0.9444444 0.9444444 0.9444444 0.9444444 1
cat("Specificity:", specificity, "\n")
## Specificity: 1 0.9767442 0.9767442 0.9767442 0.9767442 0.9767442 0.9767442 0.9767442 0.9767442 0.9767442 0.9767442 0.9689922 0.9612403 0.9612403 0.9612403 0.9612403 0.9534884 0.9534884 0.9457364 0.9379845 0.9302326 0.9224806 0.9147287 0.9069767 0.9069767 0.8992248 0.8914729 0.8837209 0

Interpreting Sensitivity and Specificity

Sensitivity

Sensitivity (true positive rate) indicates the model’s ability to correctly identify malignant tumors. The model shows a sensitivity range starting at 0 and eventually reaching 1. The incremental increase in sensitivity values from 0.75 to 1 as thresholds are adjusted suggests a gradual improvement in detecting malignant cases. Initially, at the lowest sensitivities, the model misses a significant number of malignant cases, which is critical in a medical setting as failing to identify a malignant tumor could delay necessary treatments.

However, as sensitivity increases, particularly past the 0.75 mark up to 1, the model demonstrates enhanced capability in correctly identifying nearly all malignant tumors. This high sensitivity is crucial for early detection of breast cancer, ensuring patients receive timely and potentially life-saving treatments.

Specificity

Specificity (true negative rate) measures the model’s ability to identify benign cases correctly. The model starts with perfect specificity (1), indicating no benign tumors are mistakenly classified as malignant at the strictest threshold. As the threshold is relaxed, specificity slightly declines, indicating an increase in false positives, where benign tumors are incorrectly labeled as malignant.

This decline in specificity, particularly noticeable past the 0.9767442 mark down to 0.8837209 and finally dropping to 0, suggests a trade-off between detecting more true positives and misclassifying some benign cases as malignant. This trade-off needs careful management, as unnecessary treatments or invasive procedures resulting from false positives can lead to increased patient anxiety, costs, and potential health risks from unneeded medical interventions.

Conclusion

In the context of breast cancer, where early detection is important, a model with high sensitivity (even at the cost of some loss in specificity) might be preferable. However, the exact balance depends on clinical goals:

High Sensitivity: Essential for ensuring no malignant tumor goes undetected. Given the serious implications of missing a cancer diagnosis, clinical settings might prefer a model skewed towards higher sensitivity.

Moderate Specificity Loss: Acceptable within certain limits, provided the clinical setting can manage the consequences of false positives, such as additional testing (biopsies, further imaging studies) to confirm diagnoses.

In conclusion, the model demonstrates a robust ability to identify malignant tumors (high sensitivity), with an expected decrease in specificity. This performance profile suggests that while the model is effective in ensuring malignant tumors do not go undetected, it may also increase the rate of false positives, necessitating further diagnostic procedures. For clinical use, it would be advisable to select a threshold that maintains high sensitivity while keeping specificity at a level where the burden of additional confirmatory testing remains manageable. This balance ensures the model’s utility in improving early cancer detection without excessively burdening patients and healthcare systems with false positives.