Question: Suppose you have a dataset called “PimaIndiansDiabetes2” which contains information about diabetes diagnosis. After loading the dataset and removing missing values, you split it into training and test sets using the caret package. You then performed stepwise logistic regression using the stepAIC function from the MASS package. Afterward, you conducted forward selection and backward elimination using the same stepAIC function. Finally, you compared the performance of the forward selection model and the both-direction model. Now, your task is to calculate and compare the accuracy, precision, recall, and F1-score of the both-direction model on the test data.
Can you write R code to perform the required calculations and interpret the results obtained from the confusion matrix?
## 'data.frame': 392 obs. of 9 variables:
## $ pregnant: num 1 0 3 2 1 5 0 1 1 3 ...
## $ glucose : num 89 137 78 197 189 166 118 103 115 126 ...
## $ pressure: num 66 40 50 70 60 72 84 30 70 88 ...
## $ triceps : num 23 35 32 45 23 19 47 38 30 41 ...
## $ insulin : num 94 168 88 543 846 175 230 83 96 235 ...
## $ mass : num 28.1 43.1 31 30.5 30.1 25.8 45.8 43.3 34.6 39.3 ...
## $ pedigree: num 0.167 2.288 0.248 0.158 0.398 ...
## $ age : num 21 33 26 53 59 51 31 33 32 27 ...
## $ diabetes: Factor w/ 2 levels "neg","pos": 1 2 2 2 2 2 2 1 2 1 ...
## - attr(*, "na.action")= 'omit' Named int [1:376] 1 2 3 6 8 10 11 12 13 16 ...
## ..- attr(*, "names")= chr [1:376] "1" "2" "3" "6" ...
## Loading required package: ggplot2
## Loading required package: lattice
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## [1] 314 9
## [1] 78 9
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
##
## Call:
## glm(formula = diabetes ~ 1, family = binomial, data = train.data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.7027 0.1199 -5.861 4.61e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 398.8 on 313 degrees of freedom
## Residual deviance: 398.8 on 313 degrees of freedom
## AIC: 400.8
##
## Number of Fisher Scoring iterations: 4
#Perform forward selection and backward elimination
## Analysis of Deviance Table
##
## Model 1: diabetes ~ 1
## Model 2: diabetes ~ 1
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 313 398.8
## 2 313 398.8 0 0
## df AIC
## base.model 1 400.8003
## forward.model 1 400.8003
## backward.model 6 279.7859
## step.model 1 400.8003
## Confusion Matrix and Statistics
##
## Reference
## Prediction neg pos
## neg 52 26
## pos 0 0
##
## Accuracy : 0.6667
## 95% CI : (0.5508, 0.7694)
## No Information Rate : 0.6667
## P-Value [Acc > NIR] : 0.553
##
## Kappa : 0
##
## Mcnemar's Test P-Value : 9.443e-07
##
## Sensitivity : 1.0000
## Specificity : 0.0000
## Pos Pred Value : 0.6667
## Neg Pred Value : NaN
## Prevalence : 0.6667
## Detection Rate : 0.6667
## Detection Prevalence : 1.0000
## Balanced Accuracy : 0.5000
##
## 'Positive' Class : neg
##