The classification models used in this projects are:

  1. Linear Discriminant Analysis (LDA)
  2. Quadratic Discriminant Analysis (QDA)
  3. Ridge Regression
  4. LASSO.
  5. Elastic Net

Additional packages (aside the base/default R packages) used are:

  1. ISLR2
  2. MASS
  3. e1071
  4. class
  5. caret
  6. glmnet
  7. formattable

This work still uses the Auto data of the ISLR2 package, so the cleaning process and exploratory data analysis would be skipped here, you can check the cleaning process of the first part here. All variables and datatypes change implemented in part 1 are also implemented here.

Loading required packages

library(ISLR2)
library(MASS)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:ISLR2':
## 
##     Boston
library(e1071)
library(class)
library(caret)
## Loading required package: ggplot2
## Loading required package: lattice
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.1-3
library(formattable)
## 
## Attaching package: 'formattable'
## The following object is masked from 'package:MASS':
## 
##     area

Data Description

Auto Data Set

Description Gas mileage, horsepower, and other information for 392 vehicles. A data frame with 392 observations on the following 9 variables.

mpg miles per gallon

cylinders Number of cylinders between 4 and 8

displacement Engine displacement (cu. inches)

horsepower Engine horsepower

weight Vehicle weight (lbs.)

acceleration Time to accelerate from 0 to 60 mph (sec.)

year Model year (modulo 100)

origin Origin of car (1. American, 2. European, 3. Japanese)

name Vehicle name

Source This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University. The dataset was used in the 1983 American Statistical Association Exposition. The original dataset has 397 observations, of which 5 have missing values for the variable “horsepower”. These rows are removed here. The original dataset is avaliable as a CSV file in the docs directory, as well as at https://www.statlearning.com.

References James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013) An Introduction to Statistical Learning with applications in R, https://www.statlearning.com, Springer-Verlag, New York

Data loading, variable creation and necessary datatype adjustment

df <- data.frame(Auto)
df$cylinders <- as.factor(df$cylinders)
df$origin <- as.factor(df$origin)

df$mpg01 <- rep(0, length(df$mpg))
df$mpg01[df$mpg > median(df$mpg)] <- 1
df$mpg01 <- as.factor(df$mpg01)

Classification models

train-test split

A similar 80-20 train-test split respectively was used for this project as with part 1.

set.seed(12345)
split<- sample(c(rep(0, 0.8 * nrow(df)), rep(1, 0.2 * nrow(df))))
df.train <-df[split == 0, ] 
df.test <- df[split == 1, ] 

Let’s have a look at the first rows of our train data:

head(df.train)
##   mpg cylinders displacement horsepower weight acceleration year origin
## 1  18         8          307        130   3504         12.0   70      1
## 2  15         8          350        165   3693         11.5   70      1
## 3  18         8          318        150   3436         11.0   70      1
## 4  16         8          304        150   3433         12.0   70      1
## 5  17         8          302        140   3449         10.5   70      1
## 6  15         8          429        198   4341         10.0   70      1
##                        name mpg01
## 1 chevrolet chevelle malibu     0
## 2         buick skylark 320     0
## 3        plymouth satellite     0
## 4             amc rebel sst     0
## 5               ford torino     0
## 6          ford galaxie 500     0

Let’s have a look at the first rows of our test data:

head(df.test)
##    mpg cylinders displacement horsepower weight acceleration year origin
## 7   14         8          454        220   4354          9.0   70      1
## 10  15         8          390        190   3850          8.5   70      1
## 14  14         8          455        225   3086         10.0   70      1
## 17  18         6          199         97   2774         15.5   70      1
## 24  26         4          121        113   2234         12.5   70      2
## 36  17         6          250        100   3329         15.5   71      1
##                         name mpg01
## 7           chevrolet impala     0
## 10        amc ambassador dpl     0
## 14   buick estate wagon (sw)     0
## 17                amc hornet     0
## 24                  bmw 2002     1
## 36 chevrolet chevelle malibu     0

Now we create a vector for just the target variable in both the training and test data.

Actual.mpg01_train <- df.train$mpg01
Actual.mpg01_test <- df.test$mpg01

1. Linear Discriminant Analysis (LDA)

fit_lda <- lda(mpg01~cylinders + displacement + horsepower + weight + acceleration + origin, data = df.train)
fit_lda
## Call:
## lda(mpg01 ~ cylinders + displacement + horsepower + weight + 
##     acceleration + origin, data = df.train)
## 
## Prior probabilities of groups:
##         0         1 
## 0.4968153 0.5031847 
## 
## Group means:
##   cylinders4  cylinders5 cylinders6 cylinders8 displacement horsepower   weight
## 0 0.07692308 0.006410256 0.36538462 0.53846154     277.4359  130.42308 3657.397
## 1 0.91139241 0.006329114 0.05696203 0.01898734     115.1108   78.67089 2330.158
##   acceleration    origin2    origin3
## 0     14.56346 0.05128205 0.04487179
## 1     16.60633 0.27215190 0.37341772
## 
## Coefficients of linear discriminants:
##                        LD1
## cylinders4    2.8808551614
## cylinders5    1.5842147859
## cylinders6    0.2129408102
## cylinders8    0.7614438593
## displacement -0.0038351514
## horsepower   -0.0091280837
## weight       -0.0001416608
## acceleration -0.0388246305
## origin2       0.0179465697
## origin3       0.3610059644
pred.lda<- predict(fit_lda, df.test)

Now, making classification with the lda model, we have.

cm_lda <- confusionMatrix(pred.lda$class, Actual.mpg01_test, positive='1', mode='everything')
cm_lda
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 32  2
##          1  8 36
##                                           
##                Accuracy : 0.8718          
##                  95% CI : (0.7768, 0.9368)
##     No Information Rate : 0.5128          
##     P-Value [Acc > NIR] : 2.112e-11       
##                                           
##                   Kappa : 0.7444          
##                                           
##  Mcnemar's Test P-Value : 0.1138          
##                                           
##             Sensitivity : 0.9474          
##             Specificity : 0.8000          
##          Pos Pred Value : 0.8182          
##          Neg Pred Value : 0.9412          
##               Precision : 0.8182          
##                  Recall : 0.9474          
##                      F1 : 0.8780          
##              Prevalence : 0.4872          
##          Detection Rate : 0.4615          
##    Detection Prevalence : 0.5641          
##       Balanced Accuracy : 0.8737          
##                                           
##        'Positive' Class : 1               
## 
paste("The test data misclassification rate is:", mean(pred.lda$class != Actual.mpg01_test))
## [1] "The test data misclassification rate is: 0.128205128205128"

From the above, we have

True Positive (TP) = 36 True Negative (TN) = 32 False Positive (FP) = 8 False Negative (FN) = 2

We could also use the formula (FP + FN) / (TP + TN + FP + FN) or (1 - Accuracy) and we would arrive at both result above.

2. Quadratic Discriminant Analysis (QDA)

fit_qda <- qda(mpg01~cylinders + displacement + horsepower + weight + acceleration + origin, data = df.train)
fit_qda
## Call:
## qda(mpg01 ~ cylinders + displacement + horsepower + weight + 
##     acceleration + origin, data = df.train)
## 
## Prior probabilities of groups:
##         0         1 
## 0.4968153 0.5031847 
## 
## Group means:
##   cylinders4  cylinders5 cylinders6 cylinders8 displacement horsepower   weight
## 0 0.07692308 0.006410256 0.36538462 0.53846154     277.4359  130.42308 3657.397
## 1 0.91139241 0.006329114 0.05696203 0.01898734     115.1108   78.67089 2330.158
##   acceleration    origin2    origin3
## 0     14.56346 0.05128205 0.04487179
## 1     16.60633 0.27215190 0.37341772
pred.qda<- predict(fit_qda, df.test)

Now, making classification with the qda model, we have.

cm_qda <- confusionMatrix(pred.qda$class, Actual.mpg01_test, positive='1', mode='everything')
cm_qda
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 31  3
##          1  9 35
##                                           
##                Accuracy : 0.8462          
##                  95% CI : (0.7467, 0.9179)
##     No Information Rate : 0.5128          
##     P-Value [Acc > NIR] : 6.861e-10       
##                                           
##                   Kappa : 0.6933          
##                                           
##  Mcnemar's Test P-Value : 0.1489          
##                                           
##             Sensitivity : 0.9211          
##             Specificity : 0.7750          
##          Pos Pred Value : 0.7955          
##          Neg Pred Value : 0.9118          
##               Precision : 0.7955          
##                  Recall : 0.9211          
##                      F1 : 0.8537          
##              Prevalence : 0.4872          
##          Detection Rate : 0.4487          
##    Detection Prevalence : 0.5641          
##       Balanced Accuracy : 0.8480          
##                                           
##        'Positive' Class : 1               
## 
paste("The test data misclassification rate is:", mean(pred.qda$class != Actual.mpg01_test))
## [1] "The test data misclassification rate is: 0.153846153846154"

From the above, we have

True Positive (TP) = 35 True Negative (TN) = 31 False Positive (FP) = 9 False Negative (FN) = 3

We could also use the formula (FP + FN) / (TP + TN + FP + FN) or (1 - Accuracy) and we would arrive at both result above.

The next three models use the glmnet package, the model is determined depending on the value of the argument alpha. If alpha = 0, a ridge regression model is fit and if alpha = 1, a LASSO model is fit. For the two models, cross validation is used to select optimal lambda.

Note The glmnet package requires a matrix class as inputs and we know that matrix in R can take data of a particular type. From the training data, I would create two matrices; one for the factor input variables and another for the continuous input variables seperately and we would do the same for the test data.

x_fac_train <- data.matrix(df.train[,c(2,8)])
x_con_train <- data.matrix(df.train[,c(3,4,5,6)])

x_fac_test <- data.matrix(df.test[,c(2,8)])
x_con_test <- data.matrix(df.test[,c(3,4,5,6)])

3. Ridge Regression

Now we fit the ridge regression, but first we perform #k-fold cross-validation to find the optimal lambda value that minimizes MSE of our two classes of predictor variables (factor and continuous variables)

cv_fac_train <- cv.glmnet(x_fac_train, Actual.mpg01_train, alpha=0, family="binomial")
cv_con_train <- cv.glmnet(x_con_train, Actual.mpg01_train, alpha=0, family="binomial")

The two optimal lambdas for the two classes of inputs variables are obtained using the code below

optimal_lambda_fac_train <- cv_fac_train$lambda.min
optimal_lambda_con_train <- cv_con_train$lambda.min

With the above optimal lambdas, we create two ridge regression models for the two classes of input variables and compare to get the best model.

fit_ridge1 <- glmnet(x_fac_train, Actual.mpg01_train, alpha=0, lambda=optimal_lambda_fac_train, family="binomial")
fit_ridge2 <- glmnet(x_con_train, Actual.mpg01_train, alpha=0, lambda=optimal_lambda_con_train, family="binomial")
pr.ridge1 <- predict(fit_ridge1, s=optimal_lambda_fac_train, newx=x_fac_test, family="binomial")
pred.ridge1 <- ifelse(pr.ridge1 > 0.5, "1", "0")

pr.ridge2 <- predict(fit_ridge2, s=optimal_lambda_con_train, newx=x_con_test, family="binomial")
pred.ridge2 <- ifelse(pr.ridge2 > 0.5, "1", "0")

Now, making classification with the ridge1 model, we have.

cm_ridge1 <- confusionMatrix(as.factor(pred.ridge1), Actual.mpg01_test, positive='1', mode='everything')
cm_ridge1
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 31  2
##          1  9 36
##                                           
##                Accuracy : 0.859           
##                  95% CI : (0.7617, 0.9274)
##     No Information Rate : 0.5128          
##     P-Value [Acc > NIR] : 1.266e-10       
##                                           
##                   Kappa : 0.7191          
##                                           
##  Mcnemar's Test P-Value : 0.07044         
##                                           
##             Sensitivity : 0.9474          
##             Specificity : 0.7750          
##          Pos Pred Value : 0.8000          
##          Neg Pred Value : 0.9394          
##               Precision : 0.8000          
##                  Recall : 0.9474          
##                      F1 : 0.8675          
##              Prevalence : 0.4872          
##          Detection Rate : 0.4615          
##    Detection Prevalence : 0.5769          
##       Balanced Accuracy : 0.8612          
##                                           
##        'Positive' Class : 1               
## 
paste("The test data misclassification rate is:", mean(pred.ridge1 != Actual.mpg01_test))
## [1] "The test data misclassification rate is: 0.141025641025641"

From the above, we have

True Positive (TP) = 36 True Negative (TN) = 31 False Positive (FP) = 9 False Negative (FN) = 2

We could also use the formula (FP + FN) / (TP + TN + FP + FN) or (1 - Accuracy) and we would arrive at both result above.

Now, making classification with the ridge2 model, we have.

cm_ridge2 <- confusionMatrix(as.factor(pred.ridge2), Actual.mpg01_test, positive='1', mode='everything')
cm_ridge2
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 35  3
##          1  5 35
##                                           
##                Accuracy : 0.8974          
##                  95% CI : (0.8079, 0.9547)
##     No Information Rate : 0.5128          
##     P-Value [Acc > NIR] : 4.197e-13       
##                                           
##                   Kappa : 0.795           
##                                           
##  Mcnemar's Test P-Value : 0.7237          
##                                           
##             Sensitivity : 0.9211          
##             Specificity : 0.8750          
##          Pos Pred Value : 0.8750          
##          Neg Pred Value : 0.9211          
##               Precision : 0.8750          
##                  Recall : 0.9211          
##                      F1 : 0.8974          
##              Prevalence : 0.4872          
##          Detection Rate : 0.4487          
##    Detection Prevalence : 0.5128          
##       Balanced Accuracy : 0.8980          
##                                           
##        'Positive' Class : 1               
## 
paste("The test data misclassification rate is:", mean(pred.ridge2 != Actual.mpg01_test))
## [1] "The test data misclassification rate is: 0.102564102564103"

From the above, we have

True Positive (TP) = 35 True Negative (TN) = 35 False Positive (FP) = 5 False Negative (FN) = 3

We could also use the formula (FP + FN) / (TP + TN + FP + FN) or (1 - Accuracy) and we would arrive at both result above.

From the above, we see that the ride regression model built from the continuous input variables have a higher classification accuracy than the one built from the factor input variables

4. LASSO

Now we fit the LASSO model, but first we perform #k-fold cross-validation to find the optimal lambda value that minimizes MSE of our two classes of predictor variables (factor and continuous variables)

cv_fac_train_las <- cv.glmnet(x_fac_train, Actual.mpg01_train, alpha=1, family="binomial")
cv_con_train_las <- cv.glmnet(x_con_train, Actual.mpg01_train, alpha=1, family="binomial")

The two optimal lambdas for the two classes of inputs variables are obtained using the code below

optimal_lambda_fac_train_las <- cv_fac_train_las$lambda.min
optimal_lambda_con_train_las <- cv_con_train_las$lambda.min

With the above optimal lambdas, we create two ridge regression models for the two classes of input variables and compare to get the best model.

fit_las1 <- glmnet(x_fac_train, Actual.mpg01_train, alpha=1, lambda=optimal_lambda_fac_train_las, family="binomial")
fit_las2 <- glmnet(x_con_train, Actual.mpg01_train, alpha=1, lambda=optimal_lambda_con_train_las, family="binomial")
pr.las1 <- predict(fit_las1, s=optimal_lambda_fac_train_las, newx=x_fac_test, family="binomial")
pred.las1 <- ifelse(pr.las1 > 0.5, "1", "0")

pr.las2 <- predict(fit_las2, s=optimal_lambda_con_train_las, newx=x_con_test, family="binomial")
pred.las2 <- ifelse(pr.las2 > 0.5, "1", "0")

Now, making classification with the 1st LASSO model, we have.

cm_las1 <- confusionMatrix(as.factor(pred.las1), Actual.mpg01_test, positive='1', mode='everything')
cm_las1
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 31  3
##          1  9 35
##                                           
##                Accuracy : 0.8462          
##                  95% CI : (0.7467, 0.9179)
##     No Information Rate : 0.5128          
##     P-Value [Acc > NIR] : 6.861e-10       
##                                           
##                   Kappa : 0.6933          
##                                           
##  Mcnemar's Test P-Value : 0.1489          
##                                           
##             Sensitivity : 0.9211          
##             Specificity : 0.7750          
##          Pos Pred Value : 0.7955          
##          Neg Pred Value : 0.9118          
##               Precision : 0.7955          
##                  Recall : 0.9211          
##                      F1 : 0.8537          
##              Prevalence : 0.4872          
##          Detection Rate : 0.4487          
##    Detection Prevalence : 0.5641          
##       Balanced Accuracy : 0.8480          
##                                           
##        'Positive' Class : 1               
## 
paste("The test data misclassification rate is:", mean(pred.las1 != Actual.mpg01_test))
## [1] "The test data misclassification rate is: 0.153846153846154"

From the above, we have

True Positive (TP) = 35 True Negative (TN) = 31 False Positive (FP) = 9 False Negative (FN) = 3

We could also use the formula (FP + FN) / (TP + TN + FP + FN) or (1 - Accuracy) and we would arrive at both result above.

Now, making classification with the 2nd LASSO model, we have.

cm_las2 <- confusionMatrix(as.factor(pred.las2), Actual.mpg01_test, positive='1', mode='everything')
cm_las2
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 34  2
##          1  6 36
##                                           
##                Accuracy : 0.8974          
##                  95% CI : (0.8079, 0.9547)
##     No Information Rate : 0.5128          
##     P-Value [Acc > NIR] : 4.197e-13       
##                                           
##                   Kappa : 0.7953          
##                                           
##  Mcnemar's Test P-Value : 0.2888          
##                                           
##             Sensitivity : 0.9474          
##             Specificity : 0.8500          
##          Pos Pred Value : 0.8571          
##          Neg Pred Value : 0.9444          
##               Precision : 0.8571          
##                  Recall : 0.9474          
##                      F1 : 0.9000          
##              Prevalence : 0.4872          
##          Detection Rate : 0.4615          
##    Detection Prevalence : 0.5385          
##       Balanced Accuracy : 0.8987          
##                                           
##        'Positive' Class : 1               
## 
paste("The test data misclassification rate is:", mean(pred.las2 != Actual.mpg01_test))
## [1] "The test data misclassification rate is: 0.102564102564103"

From the above, we have

True Positive (TP) = 36 True Negative (TN) = 34 False Positive (FP) = 6 False Negative (FN) = 2

We could also use the formula (FP + FN) / (TP + TN + FP + FN) or (1 - Accuracy) and we would arrive at both result above.

From the above, we see that the LASSO model built from the continuous input variables have a higher classification accuracy than the one built from the factor input variables

5. Elastic Net

The elastic net is a combination of both ridge regression and LASSO, so alpha can be any value between 0 and 1 with an optimal lambda, I use the caret library to build this model. A 10-fold cross validation technique was used in getting the tuning parameters (alpha and lambda)

set.seed(111)
cv_10 = trainControl(method="cv", number=10)
fit_elnet <- train(mpg01~cylinders + displacement + horsepower + weight + acceleration + origin, data=df.train, method="glmnet", trControl=cv_10, tuneLength=10)

fit_elnet$bestTune
##    alpha    lambda
## 40   0.4 0.3612145

From the above, we see that the optimal elastic net model is at alpha=0.4 and lambda=0.3612145

Now, making classification with the elastic net model, we have.

pred.elnet <- predict(fit_elnet, df.test)
cm_elnet <- confusionMatrix(pred.elnet, Actual.mpg01_test, positive='1', mode='everything')
cm_elnet
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 31  3
##          1  9 35
##                                           
##                Accuracy : 0.8462          
##                  95% CI : (0.7467, 0.9179)
##     No Information Rate : 0.5128          
##     P-Value [Acc > NIR] : 6.861e-10       
##                                           
##                   Kappa : 0.6933          
##                                           
##  Mcnemar's Test P-Value : 0.1489          
##                                           
##             Sensitivity : 0.9211          
##             Specificity : 0.7750          
##          Pos Pred Value : 0.7955          
##          Neg Pred Value : 0.9118          
##               Precision : 0.7955          
##                  Recall : 0.9211          
##                      F1 : 0.8537          
##              Prevalence : 0.4872          
##          Detection Rate : 0.4487          
##    Detection Prevalence : 0.5641          
##       Balanced Accuracy : 0.8480          
##                                           
##        'Positive' Class : 1               
## 
paste("The test data misclassification rate is:", mean(pred.elnet != Actual.mpg01_test))
## [1] "The test data misclassification rate is: 0.153846153846154"

From the above, we have

True Positive (TP) = 35 True Negative (TN) = 31 False Positive (FP) = 9 False Negative (FN) = 3

We could also use the formula (FP + FN) / (TP + TN + FP + FN) or (1 - Accuracy) and we would arrive at both result above.

Model Comparison

For Model comparison only two metrics were considered Sensitivity_or_recall and Specificity when choosing the best model as these two metrics considers all entries in the dataset. Sentivity deals with true positives and false negatives while specificity deals with false positives and true negatives.

For this data, since the classification of a cars mpg to both 1 ( mpg > 22.75) and 0 (mpg <= 22.75) are both important, the combination of sensitivity and specificity which are holistic metrics are considered. I also common metrics used in model selection/comparison; the codes for these and a summary are shown below.

Model_sens_or_recall <- rbind(cm_lda$byClass["Sensitivity"], cm_qda$byClass["Sensitivity"], cm_ridge1$byClass["Sensitivity"], cm_ridge2$byClass["Sensitivity"], cm_las1$byClass["Sensitivity"], cm_las2$byClass["Sensitivity"], cm_elnet$byClass["Sensitivity"])

rownames(Model_sens_or_recall) <- c("LDA", "QDA", "Ridge1", "Ridge2", "LASSO1", "LASSO2", "Elastic_Net")
colnames(Model_sens_or_recall) <- c("Sensitivity_or_Recall")

Model_sens_or_recall
##             Sensitivity_or_Recall
## LDA                     0.9473684
## QDA                     0.9210526
## Ridge1                  0.9473684
## Ridge2                  0.9210526
## LASSO1                  0.9210526
## LASSO2                  0.9473684
## Elastic_Net             0.9210526
Model_spec <- rbind(cm_lda$byClass["Specificity"], cm_qda$byClass["Specificity"], cm_ridge1$byClass["Specificity"], cm_ridge2$byClass["Specificity"], cm_las1$byClass["Specificity"], cm_las2$byClass["Specificity"], cm_elnet$byClass["Specificity"])

rownames(Model_spec) <- c("LDA", "QDA", "Ridge1", "Ridge2", "LASSO1", "LASSO2", "Elastic_Net")
colnames(Model_spec) <- c("Specificity")

Model_spec
##             Specificity
## LDA               0.800
## QDA               0.775
## Ridge1            0.775
## Ridge2            0.875
## LASSO1            0.775
## LASSO2            0.850
## Elastic_Net       0.775
Model_acc <- rbind(cm_lda$overall["Accuracy"], cm_qda$overall["Accuracy"], cm_ridge1$overall["Accuracy"], cm_ridge2$overall["Accuracy"], cm_las1$overall["Accuracy"], cm_las2$overall["Accuracy"], cm_elnet$overall["Accuracy"])

rownames(Model_acc) <- c("LDA", "QDA", "Ridge1", "Ridge2", "LASSO1", "LASSO2", "Elastic_Net")
colnames(Model_acc) <- c("Accuracy")

Model_acc
##              Accuracy
## LDA         0.8717949
## QDA         0.8461538
## Ridge1      0.8589744
## Ridge2      0.8974359
## LASSO1      0.8461538
## LASSO2      0.8974359
## Elastic_Net 0.8461538
Model_misclassifation_rate <- rbind(1-cm_lda$overall["Accuracy"], 1-cm_qda$overall["Accuracy"], 1-cm_ridge1$overall["Accuracy"], 1-cm_ridge2$overall["Accuracy"], 1-cm_las1$overall["Accuracy"], 1-cm_las2$overall["Accuracy"], 1-cm_elnet$overall["Accuracy"])

rownames(Model_misclassifation_rate) <- c("LDA", "QDA", "Ridge1", "Ridge2", "LASSO1", "LASSO2", "Elastic_Net")
colnames(Model_misclassifation_rate) <- c("Misclassification_rate")

Model_misclassifation_rate
##             Misclassification_rate
## LDA                      0.1282051
## QDA                      0.1538462
## Ridge1                   0.1410256
## Ridge2                   0.1025641
## LASSO1                   0.1538462
## LASSO2                   0.1025641
## Elastic_Net              0.1538462
Model_precision <- rbind(cm_lda$byClass["Precision"], cm_qda$byClass["Precision"], cm_ridge1$byClass["Precision"], cm_ridge2$byClass["Precision"], cm_las1$byClass["Precision"], cm_las2$byClass["Precision"], cm_elnet$byClass["Precision"])

rownames(Model_precision) <- c("LDA", "QDA", "Ridge1", "Ridge2", "LASSO1", "LASSO2", "Elastic_Net")
colnames(Model_precision) <- c("Precision")

Model_precision
##             Precision
## LDA         0.8181818
## QDA         0.7954545
## Ridge1      0.8000000
## Ridge2      0.8750000
## LASSO1      0.7954545
## LASSO2      0.8571429
## Elastic_Net 0.7954545
Model_f1 <- rbind(cm_lda$byClass["F1"], cm_qda$byClass["F1"], cm_ridge1$byClass["F1"], cm_ridge2$byClass["F1"], cm_las1$byClass["F1"], cm_las2$byClass["F1"], cm_elnet$byClass["F1"])

rownames(Model_f1) <- c("LDA", "QDA", "Ridge1", "Ridge2", "LASSO1", "LASSO2", "Elastic_Net")
colnames(Model_f1) <- c("F1")

Model_f1
##                    F1
## LDA         0.8780488
## QDA         0.8536585
## Ridge1      0.8674699
## Ridge2      0.8974359
## LASSO1      0.8536585
## LASSO2      0.9000000
## Elastic_Net 0.8536585
Model_summary <- data.frame(Model_sens_or_recall, Model_spec, Model_acc, Model_misclassifation_rate, Model_precision, Model_f1)
Model_summary
##             Sensitivity_or_Recall Specificity  Accuracy Misclassification_rate
## LDA                     0.9473684       0.800 0.8717949              0.1282051
## QDA                     0.9210526       0.775 0.8461538              0.1538462
## Ridge1                  0.9473684       0.775 0.8589744              0.1410256
## Ridge2                  0.9210526       0.875 0.8974359              0.1025641
## LASSO1                  0.9210526       0.775 0.8461538              0.1538462
## LASSO2                  0.9473684       0.850 0.8974359              0.1025641
## Elastic_Net             0.9210526       0.775 0.8461538              0.1538462
##             Precision        F1
## LDA         0.8181818 0.8780488
## QDA         0.7954545 0.8536585
## Ridge1      0.8000000 0.8674699
## Ridge2      0.8750000 0.8974359
## LASSO1      0.7954545 0.8536585
## LASSO2      0.8571429 0.9000000
## Elastic_Net 0.7954545 0.8536585

For visual effect, I decided to use color shades on the Model_summary data frame.

For Metrics where the higher values, the better the model, green shades are used (with the darker green being the best of the model) and for those where lower values are better, red shades are used (with the darker reds being the worst of the model)

formattable(Model_summary, list(
  Sensitivity_or_Recall = color_tile("white", "green"), 
  Specificity = color_tile("white", "green"), 
  Accuracy = color_tile("white", "green"), 
  Misclassification_rate = color_tile("white", "red"), 
  Precision = color_tile("white", "green"), 
  F1 = color_tile("white", "green")
  ))
Sensitivity_or_Recall Specificity Accuracy Misclassification_rate Precision F1
LDA 0.9473684 0.800 0.8717949 0.1282051 0.8181818 0.8780488
QDA 0.9210526 0.775 0.8461538 0.1538462 0.7954545 0.8536585
Ridge1 0.9473684 0.775 0.8589744 0.1410256 0.8000000 0.8674699
Ridge2 0.9210526 0.875 0.8974359 0.1025641 0.8750000 0.8974359
LASSO1 0.9210526 0.775 0.8461538 0.1538462 0.7954545 0.8536585
LASSO2 0.9473684 0.850 0.8974359 0.1025641 0.8571429 0.9000000
Elastic_Net 0.9210526 0.775 0.8461538 0.1538462 0.7954545 0.8536585

From the above, we see that the LASSO2 model (LASSO model built with the continuous input variables) has the highest sensitivity and specificity which we decide to used as the metric for selecting the best model. Hence, the LASSO2 model is our optimal model for this second part of classifying cars mpg of the Auto data from the ISLR2 package.