In this homework assignment, you will work through various classification metrics. You will be asked to create functions in R to carry out the various calculations. You will also investigate some functions in packages that will let you obtain the equivalent results. Finally, you will create graphical output that also can be used to evaluate the output of classification models, such as binary logistic regression.
The classification-output-data.csv was downloaded an read-in to the dt variable below. The file contains 181 observations and 3 variables. Three variables will be considered for this report - class (actual class for the observation), scored.class (predicted class for the observation), and scored.probability (predicted probability of success for the observation).
## class predicted prob
## Min. :0.0000 Min. :0.0000 Min. :0.02323
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.11702
## Median :0.0000 Median :0.0000 Median :0.23999
## Mean :0.3149 Mean :0.1768 Mean :0.30373
## 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.43093
## Max. :1.0000 Max. :1.0000 Max. :0.94633
## [1] 181 3
Here is raw confusion matrix using table():
## predicted
## class 0 1
## 0 119 5
## 1 30 27
The rows in the table reflect actual class values of 0 or 1. Columns reflected predicted class values of 0 or 1. A table interpretation follows:
Given that 0 is a negative class and 1 is a positive class the table can be read as follows:
Before calculating accuracy I create a helper function Cm_val()that will provide key formula inputs for accuracy and other metrics.
The cm_val function is a helper function that calculates the key values for the various metrics: TP, TN, FP, and FN from the provided dataframe as well as the specified columns containing actual and predicted values and specified values to be considered True or False.
cm_val <- function(tbl, actual, predicted, pos_value, neg_value){
cm <- as.data.frame(table(tbl[,c(actual)], tbl[,c(predicted)]))
FP <- filter(cm, Var1==neg_value & Var2==pos_value)$Freq
FN <- filter(cm, Var1==pos_value & Var2==neg_value)$Freq
TP <- filter(cm, Var1==pos_value & Var2==pos_value)$Freq
TN <- filter(cm, Var1==neg_value & Var2==neg_value)$Freq
# Reset to 0 if no category was found
if (length(FN)==0) FN<-0;
if (length(FP)==0) FP<-0;
if (length(TP)==0) TP<-0;
if (length(TN)==0) TN<-0;
return (list(TP=TP, FP=FP, TN=TN, FN=FN))
}# Define positive as 1 and negative as 0
pos_value <- 1
neg_value <- 0
accuracy <- function(tbl, actual, predicted, pos_value, neg_value){
tf <- cm_val(tbl, actual, predicted, pos_value, neg_value)
return ((tf$TP+tf$TN)/(tf$TP+tf$TN+tf$FP+tf$FN))
}
(my_accuracy <- accuracy(dt, actual='class', predicted='predicted',
pos_value, neg_value))## [1] 0.8066298
error.rate <- function(tbl, actual, predicted, pos_value, neg_value){
tf <- cm_val(tbl, actual, predicted, pos_value, neg_value)
return ((tf$FP+tf$FN)/(tf$TP+tf$TN+tf$FP+tf$FN))
}
(my_error <- error.rate(dt, actual='class', predicted='predicted',
pos_value, neg_value))## [1] 0.1933702
Accuracy and classification error rate should add up to one.
I am able to confirm that my accuracy ( 0.8066298 ) and my classification error ( 0.1933702 ) equal 1.
precision <- function(tbl, actual, predicted, pos_value, neg_value){
tf <- cm_val(tbl, actual, predicted, pos_value, neg_value)
return (tf$TP/(tf$TP+tf$FP))
}
(pre <- precision(dt, actual='class', predicted='predicted',
pos_value, neg_value))## [1] 0.84375
sensitivity <- function(tbl, actual, predicted, pos_value, neg_value){
tf <- cm_val(tbl, actual, predicted, pos_value, neg_value)
return (tf$TP/(tf$TP+tf$FN))
}
(sens <- sensitivity(dt, actual='class', predicted='predicted',
pos_value, neg_value))## [1] 0.4736842
specificity <- function(tbl, actual, predicted, pos_value, neg_value){
tf <- cm_val(tbl, actual, predicted, pos_value, neg_value)
return (tf$TN/(tf$TN+tf$FP))
}
(spec <- specificity(dt, actual='class', predicted='predicted',
pos_value, neg_value))## [1] 0.9596774
f1.score <- function(tbl, actual, predicted, pos_value, neg_value){
p <- precision(dt, actual='class', predicted='predicted',
pos_value, neg_value)
s <- sensitivity(dt, actual='class', predicted='predicted',
pos_value, neg_value)
return ((2*p*s)/(p+s))
}
(f1 <- f1.score(dt, actual='class', predicted='predicted',
pos_value, neg_value))## [1] 0.6067416
Both Precision and Sensitivity have a range from 0 to 1. Consider that if \(a>0\) and \(0<b<1\), then \(ab<a\) (a fraction of any positive number will be smaller than the original number).
Then \(Precision \times Sensitivity < Precision\) and \(Precision \times Sensitivity < Sensitivity\).
Then \(Precision \times Sensitivity+Precision \times Sensitivity < Precision + Sensitivity\), or
\(2\times Precision \times Sensitivity < Precision + Sensitivity\).
The fraction of these two values will be lower than \(1\). Also, since both values are positive, \(F1\ score\) will be positive. If \(Precision\) is zero, then \(Sensitivity\) is zero and \(F1\ Score\) is not defined. If \(Precision\) is one and \(Sensitivity\) is one, then \(F1\ Score\) is one.
So we have \(0<F\ Score\le1\).
The following function calculates the proper specificity and sensivity based on provided probability by iterating from 0 to 1 at 0.01 interval. It then computes the area under the curve by summing up all relevant rectangles. Plot and AUC are returned.
manual.roc <- function(tbl, actual, prob, pos_value, neg_value) {
tbl2 <- tbl[, c(actual, prob)]
tbl2 <- cbind(tbl2, predicted = rep(0, nrow(tbl)))
cutoff <- seq(0,1,0.01)
sens <- rep(0,length(cutoff))
spec <- rep(0,length(cutoff))
for (i in 1:length(cutoff)) {
tbl3 <- within(tbl2, predicted[prob>cutoff[i]] <- 1)
sens[i] <- sensitivity(tbl3, actual='class',
predicted='predicted', pos_value, neg_value)
spec[i] <- specificity(tbl3, actual='class',
predicted='predicted', pos_value, neg_value)
}
roc.values <- as.data.frame(cbind(cutoff, sens, spec))
roc.values[,'spec'] <- 1 - roc.values[,'spec']
# Prepare plot
pl <- ggplot(arrange(roc.values, desc(cutoff)), aes(x=spec, y=sens)) +
geom_step() +
xlab("1-Specificity") + ylab("Sensitivity") +
ggtitle("ROC Curve")
# Find AUC
auc <- roc.values %>%
arrange(desc(cutoff)) %>%
mutate(auc = (spec-lag(spec))*sens) %>%
replace(is.na(.),0) %>%
summarize(sum(auc))
return (list(plot=pl, AUC=auc))
}Let us run the function and review plot and AUC value.
## sum(auc)
## 1 0.8539898
All metrics were provided as they were calculated. As we will see below using built-in functions makes life easier.
caret Packagecaret package provides functions to calculate all of the values above and many more. For example, we can calculate sensitivity and specificity.
## [1] 0.4736842
## [1] 0.9596774
Even more encompassing function is confusionMatrix. It not only returns the actual confusion matrix, but other values including sensitivity and specificity.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 119 30
## 1 5 27
##
## Accuracy : 0.8066
## 95% CI : (0.7415, 0.8615)
## No Information Rate : 0.6851
## P-Value [Acc > NIR] : 0.0001712
##
## Kappa : 0.4916
##
## Mcnemar's Test P-Value : 4.976e-05
##
## Sensitivity : 0.4737
## Specificity : 0.9597
## Pos Pred Value : 0.8438
## Neg Pred Value : 0.7987
## Prevalence : 0.3149
## Detection Rate : 0.1492
## Detection Prevalence : 0.1768
## Balanced Accuracy : 0.7167
##
## 'Positive' Class : 1
##
The F1 score, Precision and other values and with higher precision than the summary output above. All of the values match our calculations.
pROC PackageLet us try the pROC package.
## Setting direction: controls < cases
##
## Call:
## roc.default(response = dt$class, predictor = dt$prob, levels = c(0, 1), percent = TRUE, ci = TRUE, plot = TRUE)
##
## Data: dt$prob in 124 controls (dt$class 0) < 57 cases (dt$class 1).
## Area under the curve: 85.03%
## 95% CI: 79.05%-91.01% (DeLong)
With one line of code we can get the ROC curve and several metrics. This plot looks very similar to our manual plot (some difference is likely due to different aspects). We calculated AUC at 85.4% while pROC calculated it at 85.03%. The small difference is likely due to slight discrepancy in how steps were calculated (it is also possible that the code leaves out a sliver somewhere).