Code
library(caret)
library(pROC)The aim of this assignment is to evaluate the performance of a binary classification model using the provided dataset containing actual outcomes, predicted class outcomes, and class probabilities. The performance evaluation is done in such a manner that the key aspects of the model’s performance in the context of a binary classification problem are understood.
Firstly, a confusion matrix is created that illustrates the relationship between actual class outcomes and the model’s predictions. Using this matrix, the performance of the model is evaluated using various performance measures calculated using custom R functions. These performance measures include accuracy, error rate, precision, sensitivity, specificity, and the F1 measure. These performance measures evaluate the performance of the model from various aspects.
Furthermore, the performance of the model is evaluated using the Receiver Operating Characteristic Curve and the Area Under the Curve. The ROC Curve is a plot of the sensitivity of the model against the false positive rate at various thresholds of the classification outcomes.
Lastly, the performance measures calculated using the custom functions are compared with the performance measures calculated using the pre-built functions in the R environment, such as the caret and pROC libraries.
The primary goals and objectives of this assignment are as follows:
caret and pROC packages.The dataset used in this analysis has the results of a binary classification model. Each instance in the dataset has three key variables that describe the actual outcome and the result of the model’s prediction.
class: This is the actual class of the instance in the dataset. It is the actual class label of the instance. The positive class is represented by the value 1, while the negative class is represented by the value 0.scored.class: This is the class label of the instance as determined by the model using a classification threshold.scored.probability: This is the probability of the instance in the dataset belonging to the positive class.These three variables in the dataset provide the foundation on which the performance of the model is evaluated using the confusion matrix. The performance of the model is usually determined using the actual class of the instance and the class label of the instance as determined by the model. The scored.probability variable is used in the evaluation of the performance of the model using the ROC Curve. The ROC Curve is a plot of the sensitivity of the model against the false positive rate of the model. The Area Under the Curve is used in the determination of the performance of the model. Therefore, the dataset used in the analysis has the necessary information required in the evaluation of the performance of the model.
library(caret)
library(pROC)df <- read.csv("classification-output-data (2).csv")
head(df) pregnant glucose diastolic skinfold insulin bmi pedigree age class
1 7 124 70 33 215 25.5 0.161 37 0
2 2 122 76 27 200 35.9 0.483 26 0
3 3 107 62 13 48 22.9 0.678 23 1
4 1 91 64 24 0 29.2 0.192 21 0
5 4 83 86 19 0 29.3 0.317 34 0
6 1 100 74 12 46 19.5 0.149 28 0
scored.class scored.probability
1 0 0.32845226
2 0 0.27319044
3 0 0.10966039
4 0 0.05599835
5 0 0.10049072
6 0 0.05515460
str(df)'data.frame': 181 obs. of 11 variables:
$ pregnant : int 7 2 3 1 4 1 9 8 1 2 ...
$ glucose : int 124 122 107 91 83 100 89 120 79 123 ...
$ diastolic : int 70 76 62 64 86 74 62 78 60 48 ...
$ skinfold : int 33 27 13 24 19 12 0 0 42 32 ...
$ insulin : int 215 200 48 0 0 46 0 0 48 165 ...
$ bmi : num 25.5 35.9 22.9 29.2 29.3 19.5 22.5 25 43.5 42.1 ...
$ pedigree : num 0.161 0.483 0.678 0.192 0.317 0.149 0.142 0.409 0.678 0.52 ...
$ age : int 37 26 23 21 34 28 33 64 23 26 ...
$ class : int 0 0 1 0 0 0 0 0 0 0 ...
$ scored.class : int 0 0 0 0 0 0 0 0 0 0 ...
$ scored.probability: num 0.328 0.273 0.11 0.056 0.1 ...
The confusion matrix is a summary of the classification model’s performance based on a comparison between the actual and predicted class labels. In the confusion matrix, the rows represent the actual values, and the columns represent the predicted values.
cm <- table(Actual = df$class, Predicted = df$scored.class)
cm Predicted
Actual 0 1
0 119 5
1 30 27
The structure of the Confusion Matrix is as follows:
From the results obtained:
The Confusion Matrix obtained from the results indicates that the model performs well in classifying negative cases, as shown by the large number of true negative cases. The model, however, seems to perform poorly in classifying positive cases, as shown by the number of false negative cases, which directly impacts the sensitivity of the model as shown in the following metrics.
Accuracy and classification error rate are basic evaluation metrics used to assess the overall performance of a classification model.
Accuracy calculates the percentage of correct classifications made by a model: \[ \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \]
Classification Error Rate calculates the percentage of incorrect classifications made by a model: \[ \text{Error Rate} = \frac{FP + FN}{TP + TN + FP + FN} \]
get_counts <- function(data) {
actual <- data$class
predicted <- data$scored.class
TP <- sum(actual == 1 & predicted == 1)
TN <- sum(actual == 0 & predicted == 0)
FP <- sum(actual == 0 & predicted == 1)
FN <- sum(actual == 1 & predicted == 0)
list(TP = TP, TN = TN, FP = FP, FN = FN)
}
accuracy_fn <- function(data) {
cts <- get_counts(data)
(cts$TP + cts$TN) / (cts$TP + cts$TN + cts$FP + cts$FN)
}
error_rate_fn <- function(data) {
cts <- get_counts(data)
(cts$FP + cts$FN) / (cts$TP + cts$TN + cts$FP + cts$FN)
}
acc <- accuracy_fn(df)
err <- error_rate_fn(df)
sumaccerr <- acc + err
cat("Accuracy:", acc, "\n")Accuracy: 0.8066298
cat("Error Rate:", err, "\n")Error Rate: 0.1933702
cat("Sum of Rate:", sumaccerr, "\n")Sum of Rate: 1
Accuracy of the model is around 0.8066, which means that around 80.66% of the observations are classified correctly. The error rate of the classification is around 0.1934, which means that around 19.34% of the observations classified are incorrect.
As expected, the accuracy and error rate sum up to 1.
Apart from accuracy, there are various other metrics to get a better understanding of the performance of a classification algorithm, particularly when it comes to evaluating how well a model is performing on positive and negative classes.
Precision is defined as: \[ \text{Precision} = \frac{TP}{TP + FP} \]
Sensitivity or Recall is defined as: \[ \text{Sensitivity} = \frac{TP}{TP + FN} \]
Specificity is defined as: \[ \text{Specificity} = \frac{TN}{TN + FP} \]
F1 Score is defined as: \[ F1 = \frac{2 \cdot \text{Precision} \cdot \text{Sensitivity}}{\text{Precision} + \text{Sensitivity}} \]
precision_fn <- function(data) {
cts <- get_counts(data)
cts$TP / (cts$TP + cts$FP)
}
sensitivity_fn <- function(data) {
cts <- get_counts(data)
cts$TP / (cts$TP + cts$FN)
}
specificity_fn <- function(data) {
cts <- get_counts(data)
cts$TN / (cts$TN + cts$FP)
}
f1_fn <- function(data) {
precision <- precision_fn(data)
sensitivity <- sensitivity_fn(data)
2 * precision * sensitivity / (precision + sensitivity)
}
prec <- precision_fn(df)
sens <- sensitivity_fn(df)
spec <- specificity_fn(df)
f1 <- f1_fn(df)
cat("Precision:", prec, "\n")Precision: 0.84375
cat("Sensitivity:", sens, "\n")Sensitivity: 0.4736842
cat("Specificity:", spec, "\n")Specificity: 0.9596774
cat("F1:", f1, "\n")F1: 0.6067416
The precision of the model is around 0.8438, which implies that whenever the model predicts a positive class, it is accurate 84.38% of the time. Furthermore, the sensitivity of the model is around 0.4737, implying that the model is accurate 47.37% of the time in terms of sensitivity.
Moreover, the specificity of the model is around 0.9597, implying that the model is highly accurate in terms of predicting the negative class. Additionally, the F1 score is around 0.6067, which is a balance between precision and sensitivity.
It is evident that the model is performing well in general, but in terms of the negative class, the model is highly accurate, as demonstrated by the high specificity. However, the sensitivity of the model is low, implying that the model is not performing well in terms of sensitivity, which could be crucial in a different context.
The Receiver Operating Characteristic (ROC) curve is a measure used to assess the classification ability of a model over a variety of probability thresholds. It is a plot of the true positive rate versus the false positive rate.
True Positive Rate (TPR): \[ \text{TPR} = \frac{TP}{TP + FN} \]
False Positive Rate (FPR): \[ \text{FPR} = \frac{FP}{FP + TN} \]
Area Under the Curve (AUC) is a summary value used to report the classification ability of a model. A higher AUC value indicates better classification ability, while a lower AUC value, close to 0.5, indicates poor classification ability.
roc_auc_fn <- function(data) {
actual <- data$class
probs <- data$scored.probability
thresholds <- seq(0, 1, by = 0.01)
tpr <- numeric(length(thresholds))
fpr <- numeric(length(thresholds))
for (i in seq_along(thresholds)) {
th <- thresholds[i]
pred <- ifelse(probs >= th, 1, 0)
TP <- sum(actual == 1 & pred == 1)
TN <- sum(actual == 0 & pred == 0)
FP <- sum(actual == 0 & pred == 1)
FN <- sum(actual == 1 & pred == 0)
tpr[i] <- ifelse((TP + FN) == 0, 0, TP / (TP + FN))
fpr[i] <- ifelse((FP + TN) == 0, 0, FP / (FP + TN))
}
roc_df <- data.frame(threshold = thresholds, TPR = tpr, FPR = fpr)
roc_df <- roc_df[order(roc_df$FPR, roc_df$TPR), ]
auc <- sum(diff(roc_df$FPR) *
(head(roc_df$TPR, -1) + tail(roc_df$TPR, -1)) / 2)
plot(roc_df$FPR, roc_df$TPR, type = "l", lwd = 2,
xlab = "False Positive Rate",
ylab = "True Positive Rate",
main = "ROC Curve")
abline(0, 1, lty = 2, col = "gray")
return(list(auc = auc, roc_data = roc_df))
}
roc_results <- roc_auc_fn(df)cat("Manual AUC:", round(roc_results$auc, 4), "\n")Manual AUC: 0.8489
The calculated AUC by hand is approximately 0.8489, and it shows that the model is performing well in terms of distinguishing between positive and negative classes. The ROC curve is also showing performance much higher than the diagonal line, which means the classifier is performing much better than a random guess.
F1 score is given by the equation: \[ F1 = \frac{2PR}{P + R} \]
Precision and recall are always between 0 and 1 because these values represent proportions. So, the range for precision and recall is: \[ 0 \leq P \leq 1, \quad 0 \leq R \leq 1 \]
Precision and recall are always nonnegative values, and the product of these values is always less than or equal to each of the values. This gives us the inequality: \[ 2PR \leq P + R \]
Now, dividing the above equation by (P + R), we get: \[ \frac{2PR}{P + R} \leq 1 \]
This gives us the range for the F1 score as follows: \[ 0 \leq F1 \leq 1 \]
This equation shows us that the F1 score is always between 0 and 1.
To verify the accuracy of the manually computed metrics, the results can be compared with the results obtained using the caret and pROC packages. The caret package is used to verify the accuracy, sensitivity, and specificity computed using the confusion matrix, while the pROC package is used to compute the ROC curve and the AUC.
df$class_factor <- factor(df$class, levels = c(0, 1))
df$pred_factor <- factor(df$scored.class, levels = c(0, 1))
cm_caret <- caret::confusionMatrix(
data = df$pred_factor,
reference = df$class_factor,
positive = "1"
)
cm_caretConfusion Matrix and Statistics
Reference
Prediction 0 1
0 119 30
1 5 27
Accuracy : 0.8066
95% CI : (0.7415, 0.8615)
No Information Rate : 0.6851
P-Value [Acc > NIR] : 0.0001712
Kappa : 0.4916
Mcnemar's Test P-Value : 4.976e-05
Sensitivity : 0.4737
Specificity : 0.9597
Pos Pred Value : 0.8438
Neg Pred Value : 0.7987
Prevalence : 0.3149
Detection Rate : 0.1492
Detection Prevalence : 0.1768
Balanced Accuracy : 0.7167
'Positive' Class : 1
sens_caret <- caret::sensitivity(
data = df$pred_factor,
reference = df$class_factor,
positive = "1"
)
spec_caret <- caret::specificity(
data = df$pred_factor,
reference = df$class_factor,
negative = "0"
)
cat("Sensitivity:", round(sens_caret, 4), "\n")Sensitivity: 0.4737
cat("Specificity:", round(spec_caret, 4), "\n")Specificity: 0.9597
roc_obj <- pROC::roc(df$class, df$scored.probability)
plot(roc_obj, main = "ROC Curve using pROC")
abline(a = 1, b = -1, lty = 2, col = "gray")auc_val <- pROC::auc(roc_obj)
cat("AUC:", round(as.numeric(auc_val), 4), "\n")AUC: 0.8503
The curve generated by the pROC package shows a plot of sensitivity against specificity at different classification thresholds. The x-axis of the curve generated by the manual method is replaced by a plot of specificity in decreasing order by the curve generated by the pROC package.
The ROC curve is still well above the diagonal line, showing good discriminative ability for the model. The AUC value is close to 0.85, showing that the model is effective in ranking positive instances higher than negative instances.
The Area Under the Curve is a metric used to calculate the performance of a model. The AUC value is close to the manually calculated value, showing that the model has good discriminative ability. The differences between the two values are due to the calculation of all possible thresholds by the pROC package.
In this analysis, the performance of a binary classification model was evaluated based on a range of metrics, which are based on the confusion matrix, as well as threshold-based evaluation, which used the ROC curve and AUC.
It is clear from the results obtained that the model performs well overall, with an accuracy of approximately 0.81, along with a good specificity, which implies that the model is highly effective in terms of the correct classification of negative cases. However, the low sensitivity implies that the model fails to capture a significant number of positive cases, which might be a point of concern depending on the context in which the model is used.
Precision and F1 score also highlight this, indicating that although the model is good in its predictions, it is not entirely successful in detecting the positive ones. Results obtained using the ROC curve, along with AUC values, obtained both manually and using the pROC package, highlight the good discriminatory ability of the model.
The results obtained using the caret and pROC packages validated the accuracy of the custom implementations, which was evident by the results being the same in both implementations. Overall, this analysis highlights that although the model is performing well in some aspects, it can definitely be improved, especially in terms of sensitivity.