## Warning in rgl.init(initValue, onlyNULL): RGL: unable to open X11 display
## Warning: 'rgl.init' failed, running with 'rgl.useNULL = TRUE'.

Q1: Classification Models for Crime Rate Prediction

Create binary response variable for crime rate above or below median

Logistic Regression

## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## 
## Call:
## glm(formula = CrimeAboveMedian ~ ., family = binomial, data = Boston)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.270e+02  2.030e+05  -0.002    0.999
## crim         1.056e+03  2.021e+04   0.052    0.958
## zn           2.251e+00  6.284e+01   0.036    0.971
## indus       -3.859e+00  1.542e+03  -0.003    0.998
## chas        -5.407e+00  1.089e+04   0.000    1.000
## nox          1.467e+02  2.190e+05   0.001    0.999
## rm          -4.152e+01  1.990e+03  -0.021    0.983
## age          4.756e-01  8.017e+01   0.006    0.995
## dis         -1.335e+01  2.827e+03  -0.005    0.996
## rad         -4.353e+00  3.454e+03  -0.001    0.999
## tax         -1.346e-01  1.581e+02  -0.001    0.999
## ptratio      1.464e+01  6.733e+03   0.002    0.998
## lstat       -9.119e-01  5.204e+02  -0.002    0.999
## medv         3.491e+00  7.710e+02   0.005    0.996
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 7.0146e+02  on 505  degrees of freedom
## Residual deviance: 2.8134e-05  on 492  degrees of freedom
## AIC: 28
## 
## Number of Fisher Scoring iterations: 25

LDA

## Call:
## lda(CrimeAboveMedian ~ ., data = Boston)
## 
## Prior probabilities of groups:
##   0   1 
## 0.5 0.5 
## 
## Group means:
##        crim        zn     indus       chas       nox       rm      age      dis
## 0 0.0955715 21.525692  7.002292 0.05138340 0.4709711 6.394395 51.31028 5.091596
## 1 7.1314756  1.201581 15.271265 0.08695652 0.6384190 6.174874 85.83953 2.498489
##         rad      tax  ptratio     lstat     medv
## 0  4.158103 305.7431 17.90711  9.419486 24.94941
## 1 14.940711 510.7312 19.00395 15.886640 20.11621
## 
## Coefficients of linear discriminants:
##                   LD1
## crim     0.0057477432
## zn      -0.0055783361
## indus    0.0133950314
## chas    -0.0683284866
## nox      8.2352660572
## rm       0.1127191607
## age      0.0109751104
## dis      0.0431741184
## rad      0.0723695021
## tax     -0.0008391622
## ptratio  0.0473594598
## lstat    0.0158822769
## medv     0.0361430310

Naive Bayes

## 
## ================================== Naive Bayes ================================== 
##  
##  Call: 
## naive_bayes.formula(formula = CrimeAboveMedian ~ ., data = Boston)
## 
## --------------------------------------------------------------------------------- 
##  
## Laplace smoothing: 0
## 
## --------------------------------------------------------------------------------- 
##  
##  A priori probabilities: 
## 
##   0   1 
## 0.5 0.5 
## 
## --------------------------------------------------------------------------------- 
##  
##  Tables: 
## 
## --------------------------------------------------------------------------------- 
##  ::: crim (Gaussian) 
## --------------------------------------------------------------------------------- 
##       
## crim             0           1
##   mean  0.09557150  7.13147561
##   sd    0.06281773 11.10912294
## 
## --------------------------------------------------------------------------------- 
##  ::: zn (Gaussian) 
## --------------------------------------------------------------------------------- 
##       
## zn             0         1
##   mean 21.525692  1.201581
##   sd   29.319808  4.798611
## 
## --------------------------------------------------------------------------------- 
##  ::: indus (Gaussian) 
## --------------------------------------------------------------------------------- 
##       
## indus          0         1
##   mean  7.002292 15.271265
##   sd    5.514454  5.439010
## 
## --------------------------------------------------------------------------------- 
##  ::: chas (Gaussian) 
## --------------------------------------------------------------------------------- 
##       
## chas            0          1
##   mean 0.05138340 0.08695652
##   sd   0.22121612 0.28232985
## 
## --------------------------------------------------------------------------------- 
##  ::: nox (Gaussian) 
## --------------------------------------------------------------------------------- 
##       
## nox             0          1
##   mean 0.47097115 0.63841897
##   sd   0.05559789 0.09870365
## 
## ---------------------------------------------------------------------------------
## 
## # ... and 8 more tables
## 
## ---------------------------------------------------------------------------------

KNN

##   [1] 0 0 0 0 0 0 1 0 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
##  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [75] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
## [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
## [223] 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
## [260] 1 1 1 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [297] 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [371] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [408] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [445] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [482] 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Levels: 0 1

Q2: Model Selection Approaches

2a: Which of the three models with k predictors has the smallest training RSS ?

When performing best subset selection, the model with k predictors is the model with the smallest RSS among all the pCk models with k predictors.

The model with k predictors, when doing forward stepwise selection, is the model with the minimum RSS among the pk models that add one more predictor to the predictors in M(k1).

The model with k predictors is the model that, after performing backward stepwise selection, has the minimum RSS among the k models and contains all but one of the predictors in M(k+1).

Because it was chosen from among all the models with k predictors, the model with k predictors that has the minimum training RSS is the one that was derived from the best subset selection.

2b: Which of the three models with k predictors has the smallest test RSS ?

Best subset selection may have the smallest test RSS because it considers more models then the other methods.

However, since the other models would be less susceptible to overfitting, they might have a greater chance of selecting a model that fits the test data more accurately.

More so than the selection process, the outcome will be influenced by the test set and validation procedure that are used.

2c: True or False:

i. False: In forward stepwise selection, at each step, a predictor is added to the model if it improves the model fit. Therefore, the predictors in the k-variable model identified by forward stepwise may not necessarily be a subset of the predictors in the (k + 1)-variable model identified by forward stepwise. It is possible for a predictor to be included in the k + 1-variable model but not in the k-variable model.

ii. True: In backward stepwise selection, at each step, a predictor is removed from the model if its removal improves the model fit. Therefore, the predictors in the k-variable model identified by backward stepwise will always be a subset of the predictors in the (k + 1)-variable model identified by backward stepwise.

iii. False: This statement is a repetition of ii.

iv. False: Forward stepwise selection and backward stepwise selection are different algorithms. The predictors included in the k-variable model identified by forward stepwise may or may not be a subset of the predictors in the (k + 1)-variable model identified by backward stepwise selection. The two methods can yield different sets of predictors.

v. True: Best subset selection considers all possible combinations of predictors and identifies the model with the best fit for each subset size. Since the selection process explores all possible subsets, the predictors in the k-variable model identified by best subset selection will always be a subset of the predictors in the (k + 1)-variable model identified by best subset selection.

Q3: Predicting Number of Applications Received

Load required packages

Load College dataset

Set seed for reproducibility

Calculate test error for linear model (mean squared error)

## [1] "Linear Model Test Error: 1882073.83239865"
## [1] "Ridge Regression Test Error: 1893106.5864917"
## [1] "Lasso Model Test Error: 2006127.86876507"
## [1] "Number of Non-Zero Coefficients in Lasso Model: 0"