The classification models used in this projects are:
Additional packages (aside the base/default R packages) used are:
This work still uses the Auto data of the ISLR2 package, so the cleaning process and exploratory data analysis would be skipped here, you can check the cleaning process of the first part here. All variables and datatypes change implemented in part 1 are also implemented here.
library(ISLR2)
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:ISLR2':
##
## Boston
library(e1071)
library(class)
library(caret)
## Loading required package: ggplot2
## Loading required package: lattice
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.1-3
library(formattable)
##
## Attaching package: 'formattable'
## The following object is masked from 'package:MASS':
##
## area
Auto Data Set
Description Gas mileage, horsepower, and other information for 392 vehicles. A data frame with 392 observations on the following 9 variables.
mpg miles per gallon
cylinders Number of cylinders between 4 and 8
displacement Engine displacement (cu. inches)
horsepower Engine horsepower
weight Vehicle weight (lbs.)
acceleration Time to accelerate from 0 to 60 mph (sec.)
year Model year (modulo 100)
origin Origin of car (1. American, 2. European, 3. Japanese)
name Vehicle name
Source This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University. The dataset was used in the 1983 American Statistical Association Exposition. The original dataset has 397 observations, of which 5 have missing values for the variable “horsepower”. These rows are removed here. The original dataset is avaliable as a CSV file in the docs directory, as well as at https://www.statlearning.com.
References James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013) An Introduction to Statistical Learning with applications in R, https://www.statlearning.com, Springer-Verlag, New York
df <- data.frame(Auto)
df$cylinders <- as.factor(df$cylinders)
df$origin <- as.factor(df$origin)
df$mpg01 <- rep(0, length(df$mpg))
df$mpg01[df$mpg > median(df$mpg)] <- 1
df$mpg01 <- as.factor(df$mpg01)
A similar 80-20 train-test split respectively was used for this project as with part 1.
set.seed(12345)
split<- sample(c(rep(0, 0.8 * nrow(df)), rep(1, 0.2 * nrow(df))))
df.train <-df[split == 0, ]
df.test <- df[split == 1, ]
Let’s have a look at the first rows of our train data:
head(df.train)
## mpg cylinders displacement horsepower weight acceleration year origin
## 1 18 8 307 130 3504 12.0 70 1
## 2 15 8 350 165 3693 11.5 70 1
## 3 18 8 318 150 3436 11.0 70 1
## 4 16 8 304 150 3433 12.0 70 1
## 5 17 8 302 140 3449 10.5 70 1
## 6 15 8 429 198 4341 10.0 70 1
## name mpg01
## 1 chevrolet chevelle malibu 0
## 2 buick skylark 320 0
## 3 plymouth satellite 0
## 4 amc rebel sst 0
## 5 ford torino 0
## 6 ford galaxie 500 0
Let’s have a look at the first rows of our test data:
head(df.test)
## mpg cylinders displacement horsepower weight acceleration year origin
## 7 14 8 454 220 4354 9.0 70 1
## 10 15 8 390 190 3850 8.5 70 1
## 14 14 8 455 225 3086 10.0 70 1
## 17 18 6 199 97 2774 15.5 70 1
## 24 26 4 121 113 2234 12.5 70 2
## 36 17 6 250 100 3329 15.5 71 1
## name mpg01
## 7 chevrolet impala 0
## 10 amc ambassador dpl 0
## 14 buick estate wagon (sw) 0
## 17 amc hornet 0
## 24 bmw 2002 1
## 36 chevrolet chevelle malibu 0
Now we create a vector for just the target variable in both the training and test data.
Actual.mpg01_train <- df.train$mpg01
Actual.mpg01_test <- df.test$mpg01
fit_lda <- lda(mpg01~cylinders + displacement + horsepower + weight + acceleration + origin, data = df.train)
fit_lda
## Call:
## lda(mpg01 ~ cylinders + displacement + horsepower + weight +
## acceleration + origin, data = df.train)
##
## Prior probabilities of groups:
## 0 1
## 0.4968153 0.5031847
##
## Group means:
## cylinders4 cylinders5 cylinders6 cylinders8 displacement horsepower weight
## 0 0.07692308 0.006410256 0.36538462 0.53846154 277.4359 130.42308 3657.397
## 1 0.91139241 0.006329114 0.05696203 0.01898734 115.1108 78.67089 2330.158
## acceleration origin2 origin3
## 0 14.56346 0.05128205 0.04487179
## 1 16.60633 0.27215190 0.37341772
##
## Coefficients of linear discriminants:
## LD1
## cylinders4 2.8808551614
## cylinders5 1.5842147859
## cylinders6 0.2129408102
## cylinders8 0.7614438593
## displacement -0.0038351514
## horsepower -0.0091280837
## weight -0.0001416608
## acceleration -0.0388246305
## origin2 0.0179465697
## origin3 0.3610059644
pred.lda<- predict(fit_lda, df.test)
Now, making classification with the lda model, we have.
cm_lda <- confusionMatrix(pred.lda$class, Actual.mpg01_test, positive='1', mode='everything')
cm_lda
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 32 2
## 1 8 36
##
## Accuracy : 0.8718
## 95% CI : (0.7768, 0.9368)
## No Information Rate : 0.5128
## P-Value [Acc > NIR] : 2.112e-11
##
## Kappa : 0.7444
##
## Mcnemar's Test P-Value : 0.1138
##
## Sensitivity : 0.9474
## Specificity : 0.8000
## Pos Pred Value : 0.8182
## Neg Pred Value : 0.9412
## Precision : 0.8182
## Recall : 0.9474
## F1 : 0.8780
## Prevalence : 0.4872
## Detection Rate : 0.4615
## Detection Prevalence : 0.5641
## Balanced Accuracy : 0.8737
##
## 'Positive' Class : 1
##
paste("The test data misclassification rate is:", mean(pred.lda$class != Actual.mpg01_test))
## [1] "The test data misclassification rate is: 0.128205128205128"
From the above, we have
True Positive (TP) = 36 True Negative (TN) = 32 False Positive (FP) = 8 False Negative (FN) = 2
We could also use the formula (FP + FN) / (TP + TN + FP + FN) or (1 - Accuracy) and we would arrive at both result above.
fit_qda <- qda(mpg01~cylinders + displacement + horsepower + weight + acceleration + origin, data = df.train)
fit_qda
## Call:
## qda(mpg01 ~ cylinders + displacement + horsepower + weight +
## acceleration + origin, data = df.train)
##
## Prior probabilities of groups:
## 0 1
## 0.4968153 0.5031847
##
## Group means:
## cylinders4 cylinders5 cylinders6 cylinders8 displacement horsepower weight
## 0 0.07692308 0.006410256 0.36538462 0.53846154 277.4359 130.42308 3657.397
## 1 0.91139241 0.006329114 0.05696203 0.01898734 115.1108 78.67089 2330.158
## acceleration origin2 origin3
## 0 14.56346 0.05128205 0.04487179
## 1 16.60633 0.27215190 0.37341772
pred.qda<- predict(fit_qda, df.test)
Now, making classification with the qda model, we have.
cm_qda <- confusionMatrix(pred.qda$class, Actual.mpg01_test, positive='1', mode='everything')
cm_qda
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 31 3
## 1 9 35
##
## Accuracy : 0.8462
## 95% CI : (0.7467, 0.9179)
## No Information Rate : 0.5128
## P-Value [Acc > NIR] : 6.861e-10
##
## Kappa : 0.6933
##
## Mcnemar's Test P-Value : 0.1489
##
## Sensitivity : 0.9211
## Specificity : 0.7750
## Pos Pred Value : 0.7955
## Neg Pred Value : 0.9118
## Precision : 0.7955
## Recall : 0.9211
## F1 : 0.8537
## Prevalence : 0.4872
## Detection Rate : 0.4487
## Detection Prevalence : 0.5641
## Balanced Accuracy : 0.8480
##
## 'Positive' Class : 1
##
paste("The test data misclassification rate is:", mean(pred.qda$class != Actual.mpg01_test))
## [1] "The test data misclassification rate is: 0.153846153846154"
From the above, we have
True Positive (TP) = 35 True Negative (TN) = 31 False Positive (FP) = 9 False Negative (FN) = 3
We could also use the formula (FP + FN) / (TP + TN + FP + FN) or (1 - Accuracy) and we would arrive at both result above.
The next three models use the glmnet package, the model is determined depending on the value of the argument alpha. If alpha = 0, a ridge regression model is fit and if alpha = 1, a LASSO model is fit. For the two models, cross validation is used to select optimal lambda.
Note The glmnet package requires a matrix class as inputs and we know that matrix in R can take data of a particular type. From the training data, I would create two matrices; one for the factor input variables and another for the continuous input variables seperately and we would do the same for the test data.
x_fac_train <- data.matrix(df.train[,c(2,8)])
x_con_train <- data.matrix(df.train[,c(3,4,5,6)])
x_fac_test <- data.matrix(df.test[,c(2,8)])
x_con_test <- data.matrix(df.test[,c(3,4,5,6)])
Now we fit the ridge regression, but first we perform #k-fold cross-validation to find the optimal lambda value that minimizes MSE of our two classes of predictor variables (factor and continuous variables)
cv_fac_train <- cv.glmnet(x_fac_train, Actual.mpg01_train, alpha=0, family="binomial")
cv_con_train <- cv.glmnet(x_con_train, Actual.mpg01_train, alpha=0, family="binomial")
The two optimal lambdas for the two classes of inputs variables are obtained using the code below
optimal_lambda_fac_train <- cv_fac_train$lambda.min
optimal_lambda_con_train <- cv_con_train$lambda.min
With the above optimal lambdas, we create two ridge regression models for the two classes of input variables and compare to get the best model.
fit_ridge1 <- glmnet(x_fac_train, Actual.mpg01_train, alpha=0, lambda=optimal_lambda_fac_train, family="binomial")
fit_ridge2 <- glmnet(x_con_train, Actual.mpg01_train, alpha=0, lambda=optimal_lambda_con_train, family="binomial")
pr.ridge1 <- predict(fit_ridge1, s=optimal_lambda_fac_train, newx=x_fac_test, family="binomial")
pred.ridge1 <- ifelse(pr.ridge1 > 0.5, "1", "0")
pr.ridge2 <- predict(fit_ridge2, s=optimal_lambda_con_train, newx=x_con_test, family="binomial")
pred.ridge2 <- ifelse(pr.ridge2 > 0.5, "1", "0")
Now, making classification with the ridge1 model, we have.
cm_ridge1 <- confusionMatrix(as.factor(pred.ridge1), Actual.mpg01_test, positive='1', mode='everything')
cm_ridge1
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 31 2
## 1 9 36
##
## Accuracy : 0.859
## 95% CI : (0.7617, 0.9274)
## No Information Rate : 0.5128
## P-Value [Acc > NIR] : 1.266e-10
##
## Kappa : 0.7191
##
## Mcnemar's Test P-Value : 0.07044
##
## Sensitivity : 0.9474
## Specificity : 0.7750
## Pos Pred Value : 0.8000
## Neg Pred Value : 0.9394
## Precision : 0.8000
## Recall : 0.9474
## F1 : 0.8675
## Prevalence : 0.4872
## Detection Rate : 0.4615
## Detection Prevalence : 0.5769
## Balanced Accuracy : 0.8612
##
## 'Positive' Class : 1
##
paste("The test data misclassification rate is:", mean(pred.ridge1 != Actual.mpg01_test))
## [1] "The test data misclassification rate is: 0.141025641025641"
From the above, we have
True Positive (TP) = 36 True Negative (TN) = 31 False Positive (FP) = 9 False Negative (FN) = 2
We could also use the formula (FP + FN) / (TP + TN + FP + FN) or (1 - Accuracy) and we would arrive at both result above.
Now, making classification with the ridge2 model, we have.
cm_ridge2 <- confusionMatrix(as.factor(pred.ridge2), Actual.mpg01_test, positive='1', mode='everything')
cm_ridge2
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 35 3
## 1 5 35
##
## Accuracy : 0.8974
## 95% CI : (0.8079, 0.9547)
## No Information Rate : 0.5128
## P-Value [Acc > NIR] : 4.197e-13
##
## Kappa : 0.795
##
## Mcnemar's Test P-Value : 0.7237
##
## Sensitivity : 0.9211
## Specificity : 0.8750
## Pos Pred Value : 0.8750
## Neg Pred Value : 0.9211
## Precision : 0.8750
## Recall : 0.9211
## F1 : 0.8974
## Prevalence : 0.4872
## Detection Rate : 0.4487
## Detection Prevalence : 0.5128
## Balanced Accuracy : 0.8980
##
## 'Positive' Class : 1
##
paste("The test data misclassification rate is:", mean(pred.ridge2 != Actual.mpg01_test))
## [1] "The test data misclassification rate is: 0.102564102564103"
From the above, we have
True Positive (TP) = 35 True Negative (TN) = 35 False Positive (FP) = 5 False Negative (FN) = 3
We could also use the formula (FP + FN) / (TP + TN + FP + FN) or (1 - Accuracy) and we would arrive at both result above.
From the above, we see that the ride regression model built from the continuous input variables have a higher classification accuracy than the one built from the factor input variables
Now we fit the LASSO model, but first we perform #k-fold cross-validation to find the optimal lambda value that minimizes MSE of our two classes of predictor variables (factor and continuous variables)
cv_fac_train_las <- cv.glmnet(x_fac_train, Actual.mpg01_train, alpha=1, family="binomial")
cv_con_train_las <- cv.glmnet(x_con_train, Actual.mpg01_train, alpha=1, family="binomial")
The two optimal lambdas for the two classes of inputs variables are obtained using the code below
optimal_lambda_fac_train_las <- cv_fac_train_las$lambda.min
optimal_lambda_con_train_las <- cv_con_train_las$lambda.min
With the above optimal lambdas, we create two ridge regression models for the two classes of input variables and compare to get the best model.
fit_las1 <- glmnet(x_fac_train, Actual.mpg01_train, alpha=1, lambda=optimal_lambda_fac_train_las, family="binomial")
fit_las2 <- glmnet(x_con_train, Actual.mpg01_train, alpha=1, lambda=optimal_lambda_con_train_las, family="binomial")
pr.las1 <- predict(fit_las1, s=optimal_lambda_fac_train_las, newx=x_fac_test, family="binomial")
pred.las1 <- ifelse(pr.las1 > 0.5, "1", "0")
pr.las2 <- predict(fit_las2, s=optimal_lambda_con_train_las, newx=x_con_test, family="binomial")
pred.las2 <- ifelse(pr.las2 > 0.5, "1", "0")
Now, making classification with the 1st LASSO model, we have.
cm_las1 <- confusionMatrix(as.factor(pred.las1), Actual.mpg01_test, positive='1', mode='everything')
cm_las1
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 31 3
## 1 9 35
##
## Accuracy : 0.8462
## 95% CI : (0.7467, 0.9179)
## No Information Rate : 0.5128
## P-Value [Acc > NIR] : 6.861e-10
##
## Kappa : 0.6933
##
## Mcnemar's Test P-Value : 0.1489
##
## Sensitivity : 0.9211
## Specificity : 0.7750
## Pos Pred Value : 0.7955
## Neg Pred Value : 0.9118
## Precision : 0.7955
## Recall : 0.9211
## F1 : 0.8537
## Prevalence : 0.4872
## Detection Rate : 0.4487
## Detection Prevalence : 0.5641
## Balanced Accuracy : 0.8480
##
## 'Positive' Class : 1
##
paste("The test data misclassification rate is:", mean(pred.las1 != Actual.mpg01_test))
## [1] "The test data misclassification rate is: 0.153846153846154"
From the above, we have
True Positive (TP) = 35 True Negative (TN) = 31 False Positive (FP) = 9 False Negative (FN) = 3
We could also use the formula (FP + FN) / (TP + TN + FP + FN) or (1 - Accuracy) and we would arrive at both result above.
Now, making classification with the 2nd LASSO model, we have.
cm_las2 <- confusionMatrix(as.factor(pred.las2), Actual.mpg01_test, positive='1', mode='everything')
cm_las2
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 34 2
## 1 6 36
##
## Accuracy : 0.8974
## 95% CI : (0.8079, 0.9547)
## No Information Rate : 0.5128
## P-Value [Acc > NIR] : 4.197e-13
##
## Kappa : 0.7953
##
## Mcnemar's Test P-Value : 0.2888
##
## Sensitivity : 0.9474
## Specificity : 0.8500
## Pos Pred Value : 0.8571
## Neg Pred Value : 0.9444
## Precision : 0.8571
## Recall : 0.9474
## F1 : 0.9000
## Prevalence : 0.4872
## Detection Rate : 0.4615
## Detection Prevalence : 0.5385
## Balanced Accuracy : 0.8987
##
## 'Positive' Class : 1
##
paste("The test data misclassification rate is:", mean(pred.las2 != Actual.mpg01_test))
## [1] "The test data misclassification rate is: 0.102564102564103"
From the above, we have
True Positive (TP) = 36 True Negative (TN) = 34 False Positive (FP) = 6 False Negative (FN) = 2
We could also use the formula (FP + FN) / (TP + TN + FP + FN) or (1 - Accuracy) and we would arrive at both result above.
From the above, we see that the LASSO model built from the continuous input variables have a higher classification accuracy than the one built from the factor input variables
The elastic net is a combination of both ridge regression and LASSO, so alpha can be any value between 0 and 1 with an optimal lambda, I use the caret library to build this model. A 10-fold cross validation technique was used in getting the tuning parameters (alpha and lambda)
set.seed(111)
cv_10 = trainControl(method="cv", number=10)
fit_elnet <- train(mpg01~cylinders + displacement + horsepower + weight + acceleration + origin, data=df.train, method="glmnet", trControl=cv_10, tuneLength=10)
fit_elnet$bestTune
## alpha lambda
## 40 0.4 0.3612145
From the above, we see that the optimal elastic net model is at alpha=0.4 and lambda=0.3612145
Now, making classification with the elastic net model, we have.
pred.elnet <- predict(fit_elnet, df.test)
cm_elnet <- confusionMatrix(pred.elnet, Actual.mpg01_test, positive='1', mode='everything')
cm_elnet
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 31 3
## 1 9 35
##
## Accuracy : 0.8462
## 95% CI : (0.7467, 0.9179)
## No Information Rate : 0.5128
## P-Value [Acc > NIR] : 6.861e-10
##
## Kappa : 0.6933
##
## Mcnemar's Test P-Value : 0.1489
##
## Sensitivity : 0.9211
## Specificity : 0.7750
## Pos Pred Value : 0.7955
## Neg Pred Value : 0.9118
## Precision : 0.7955
## Recall : 0.9211
## F1 : 0.8537
## Prevalence : 0.4872
## Detection Rate : 0.4487
## Detection Prevalence : 0.5641
## Balanced Accuracy : 0.8480
##
## 'Positive' Class : 1
##
paste("The test data misclassification rate is:", mean(pred.elnet != Actual.mpg01_test))
## [1] "The test data misclassification rate is: 0.153846153846154"
From the above, we have
True Positive (TP) = 35 True Negative (TN) = 31 False Positive (FP) = 9 False Negative (FN) = 3
We could also use the formula (FP + FN) / (TP + TN + FP + FN) or (1 - Accuracy) and we would arrive at both result above.
For Model comparison only two metrics were considered Sensitivity_or_recall and Specificity when choosing the best model as these two metrics considers all entries in the dataset. Sentivity deals with true positives and false negatives while specificity deals with false positives and true negatives.
For this data, since the classification of a cars mpg to both 1 ( mpg > 22.75) and 0 (mpg <= 22.75) are both important, the combination of sensitivity and specificity which are holistic metrics are considered. I also common metrics used in model selection/comparison; the codes for these and a summary are shown below.
Model_sens_or_recall <- rbind(cm_lda$byClass["Sensitivity"], cm_qda$byClass["Sensitivity"], cm_ridge1$byClass["Sensitivity"], cm_ridge2$byClass["Sensitivity"], cm_las1$byClass["Sensitivity"], cm_las2$byClass["Sensitivity"], cm_elnet$byClass["Sensitivity"])
rownames(Model_sens_or_recall) <- c("LDA", "QDA", "Ridge1", "Ridge2", "LASSO1", "LASSO2", "Elastic_Net")
colnames(Model_sens_or_recall) <- c("Sensitivity_or_Recall")
Model_sens_or_recall
## Sensitivity_or_Recall
## LDA 0.9473684
## QDA 0.9210526
## Ridge1 0.9473684
## Ridge2 0.9210526
## LASSO1 0.9210526
## LASSO2 0.9473684
## Elastic_Net 0.9210526
Model_spec <- rbind(cm_lda$byClass["Specificity"], cm_qda$byClass["Specificity"], cm_ridge1$byClass["Specificity"], cm_ridge2$byClass["Specificity"], cm_las1$byClass["Specificity"], cm_las2$byClass["Specificity"], cm_elnet$byClass["Specificity"])
rownames(Model_spec) <- c("LDA", "QDA", "Ridge1", "Ridge2", "LASSO1", "LASSO2", "Elastic_Net")
colnames(Model_spec) <- c("Specificity")
Model_spec
## Specificity
## LDA 0.800
## QDA 0.775
## Ridge1 0.775
## Ridge2 0.875
## LASSO1 0.775
## LASSO2 0.850
## Elastic_Net 0.775
Model_acc <- rbind(cm_lda$overall["Accuracy"], cm_qda$overall["Accuracy"], cm_ridge1$overall["Accuracy"], cm_ridge2$overall["Accuracy"], cm_las1$overall["Accuracy"], cm_las2$overall["Accuracy"], cm_elnet$overall["Accuracy"])
rownames(Model_acc) <- c("LDA", "QDA", "Ridge1", "Ridge2", "LASSO1", "LASSO2", "Elastic_Net")
colnames(Model_acc) <- c("Accuracy")
Model_acc
## Accuracy
## LDA 0.8717949
## QDA 0.8461538
## Ridge1 0.8589744
## Ridge2 0.8974359
## LASSO1 0.8461538
## LASSO2 0.8974359
## Elastic_Net 0.8461538
Model_misclassifation_rate <- rbind(1-cm_lda$overall["Accuracy"], 1-cm_qda$overall["Accuracy"], 1-cm_ridge1$overall["Accuracy"], 1-cm_ridge2$overall["Accuracy"], 1-cm_las1$overall["Accuracy"], 1-cm_las2$overall["Accuracy"], 1-cm_elnet$overall["Accuracy"])
rownames(Model_misclassifation_rate) <- c("LDA", "QDA", "Ridge1", "Ridge2", "LASSO1", "LASSO2", "Elastic_Net")
colnames(Model_misclassifation_rate) <- c("Misclassification_rate")
Model_misclassifation_rate
## Misclassification_rate
## LDA 0.1282051
## QDA 0.1538462
## Ridge1 0.1410256
## Ridge2 0.1025641
## LASSO1 0.1538462
## LASSO2 0.1025641
## Elastic_Net 0.1538462
Model_precision <- rbind(cm_lda$byClass["Precision"], cm_qda$byClass["Precision"], cm_ridge1$byClass["Precision"], cm_ridge2$byClass["Precision"], cm_las1$byClass["Precision"], cm_las2$byClass["Precision"], cm_elnet$byClass["Precision"])
rownames(Model_precision) <- c("LDA", "QDA", "Ridge1", "Ridge2", "LASSO1", "LASSO2", "Elastic_Net")
colnames(Model_precision) <- c("Precision")
Model_precision
## Precision
## LDA 0.8181818
## QDA 0.7954545
## Ridge1 0.8000000
## Ridge2 0.8750000
## LASSO1 0.7954545
## LASSO2 0.8571429
## Elastic_Net 0.7954545
Model_f1 <- rbind(cm_lda$byClass["F1"], cm_qda$byClass["F1"], cm_ridge1$byClass["F1"], cm_ridge2$byClass["F1"], cm_las1$byClass["F1"], cm_las2$byClass["F1"], cm_elnet$byClass["F1"])
rownames(Model_f1) <- c("LDA", "QDA", "Ridge1", "Ridge2", "LASSO1", "LASSO2", "Elastic_Net")
colnames(Model_f1) <- c("F1")
Model_f1
## F1
## LDA 0.8780488
## QDA 0.8536585
## Ridge1 0.8674699
## Ridge2 0.8974359
## LASSO1 0.8536585
## LASSO2 0.9000000
## Elastic_Net 0.8536585
Model_summary <- data.frame(Model_sens_or_recall, Model_spec, Model_acc, Model_misclassifation_rate, Model_precision, Model_f1)
Model_summary
## Sensitivity_or_Recall Specificity Accuracy Misclassification_rate
## LDA 0.9473684 0.800 0.8717949 0.1282051
## QDA 0.9210526 0.775 0.8461538 0.1538462
## Ridge1 0.9473684 0.775 0.8589744 0.1410256
## Ridge2 0.9210526 0.875 0.8974359 0.1025641
## LASSO1 0.9210526 0.775 0.8461538 0.1538462
## LASSO2 0.9473684 0.850 0.8974359 0.1025641
## Elastic_Net 0.9210526 0.775 0.8461538 0.1538462
## Precision F1
## LDA 0.8181818 0.8780488
## QDA 0.7954545 0.8536585
## Ridge1 0.8000000 0.8674699
## Ridge2 0.8750000 0.8974359
## LASSO1 0.7954545 0.8536585
## LASSO2 0.8571429 0.9000000
## Elastic_Net 0.7954545 0.8536585
For visual effect, I decided to use color shades on the Model_summary data frame.
For Metrics where the higher values, the better the model, green shades are used (with the darker green being the best of the model) and for those where lower values are better, red shades are used (with the darker reds being the worst of the model)
formattable(Model_summary, list(
Sensitivity_or_Recall = color_tile("white", "green"),
Specificity = color_tile("white", "green"),
Accuracy = color_tile("white", "green"),
Misclassification_rate = color_tile("white", "red"),
Precision = color_tile("white", "green"),
F1 = color_tile("white", "green")
))
| Sensitivity_or_Recall | Specificity | Accuracy | Misclassification_rate | Precision | F1 | |
|---|---|---|---|---|---|---|
| LDA | 0.9473684 | 0.800 | 0.8717949 | 0.1282051 | 0.8181818 | 0.8780488 |
| QDA | 0.9210526 | 0.775 | 0.8461538 | 0.1538462 | 0.7954545 | 0.8536585 |
| Ridge1 | 0.9473684 | 0.775 | 0.8589744 | 0.1410256 | 0.8000000 | 0.8674699 |
| Ridge2 | 0.9210526 | 0.875 | 0.8974359 | 0.1025641 | 0.8750000 | 0.8974359 |
| LASSO1 | 0.9210526 | 0.775 | 0.8461538 | 0.1538462 | 0.7954545 | 0.8536585 |
| LASSO2 | 0.9473684 | 0.850 | 0.8974359 | 0.1025641 | 0.8571429 | 0.9000000 |
| Elastic_Net | 0.9210526 | 0.775 | 0.8461538 | 0.1538462 | 0.7954545 | 0.8536585 |
From the above, we see that the LASSO2 model (LASSO model built with the continuous input variables) has the highest sensitivity and specificity which we decide to used as the metric for selecting the best model. Hence, the LASSO2 model is our optimal model for this second part of classifying cars mpg of the Auto data from the ISLR2 package.