Submission info—>

Homework #3

Ben Peloquin

SUNetID: bpeloqui

Exercises submitted:

Conceptual:

Regularization section 6.8 #2 (a, b)

Applied:

Regularization section 6.8 #9 (a-d)

CONCEPTUAL

2a Lasso vs Least Squares:

  1. F

  2. F

  3. T

  4. F

Lasso will have least squares coef estimates when lambda is 0. Qualitatively, Lasso behaves similarily to Ridge Regression in that it’s advantage comes from improvements in variance with minimal (at first) cost to bias. As lambda increases the lasso becomes less flexible, decreasing variance and increasing bias. Since the qualititative behavior of Lasso and Ridge Regression are similar I talk more about this in 2b. One thing I’ll note is one nice advantage of the Lasso over Ridge Regression, which is that it produces more interpretable models because it will involve only a subset of the predictors unlike Ridge regression which will always include the full set of predictors. Taking this a step further, since the Lasso assumes that a number of the coefs are 0 (feature selection), Ridge Regression will outperform the Lasso when all the predictors are in fact related to the response, however the Lasso will outperform Ridge Regression if only a small subset of the total predictors are actually related the response. Since we may not know his a priori it could be good to run both.

2b Ridge Regression vs Least Squares:

  1. F

  2. F

  3. T

  4. F

A lambda of 0 corresponds with least squares coefficient estimates. As lambda increases, the flexibility of the ridge regression fit decreases. This corresponds with decreased variance and increased bias. The key advantage of Ridge Regression is that the shrinkage of the ridge coefficent estimates leads to a large reduction in predictor variance with only a slight increase in bias. This is helpful when we’re fitting a linear model in which n and p are fairly close are if p > n. Ridge regression addresses the variability that would result from least squares fit.

library(ISLR)
library(glmnet)
## Warning: package 'glmnet' was built under R version 3.1.3
## Loading required package: Matrix
## Loading required package: foreach
## Loaded glmnet 2.0-2
#note: Citation - Lab6 of ISLR was very helpful for this

#A: Training set and test set ---------->
set.seed(1)
names(College)
##  [1] "Private"     "Apps"        "Accept"      "Enroll"      "Top10perc"  
##  [6] "Top25perc"   "F.Undergrad" "P.Undergrad" "Outstate"    "Room.Board" 
## [11] "Books"       "Personal"    "PhD"         "Terminal"    "S.F.Ratio"  
## [16] "perc.alumni" "Expend"      "Grad.Rate"
grid = 10^seq(10, -2, length=100)
x = model.matrix(Apps~., College)[,-1]
y = College$Apps
train = sample(1:nrow(x), nrow(x)/2)
test = (-train)
y.test = y[test]

#B: linear model using least squares ----------------------->
lm = lm(y~x, subset=train)
summary(lm)
## 
## Call:
## lm(formula = y ~ x, subset = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5276.1  -473.2   -63.9   351.9  6574.0 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    78.15204  600.84427   0.130 0.896581    
## xPrivateYes  -757.22843  205.47577  -3.685 0.000263 ***
## xAccept         1.67981    0.05196  32.329  < 2e-16 ***
## xEnroll        -0.62380    0.27629  -2.258 0.024544 *  
## xTop10perc     67.45654    8.45231   7.981 1.84e-14 ***
## xTop25perc    -22.37500    6.57093  -3.405 0.000734 ***
## xF.Undergrad   -0.06126    0.05468  -1.120 0.263258    
## xP.Undergrad    0.04745    0.06248   0.760 0.448024    
## xOutstate      -0.09227    0.02889  -3.194 0.001524 ** 
## xRoom.Board     0.24513    0.07300   3.358 0.000867 ***
## xBooks          0.09086    0.36826   0.247 0.805254    
## xPersonal       0.05886    0.09260   0.636 0.525455    
## xPhD           -8.89027    7.20890  -1.233 0.218271    
## xTerminal      -1.71947    8.22589  -0.209 0.834539    
## xS.F.Ratio     -5.75201   21.32871  -0.270 0.787554    
## xperc.alumni   -1.46681    6.28702  -0.233 0.815652    
## xExpend         0.03487    0.01928   1.808 0.071361 .  
## xGrad.Rate      7.57567    4.69602   1.613 0.107551    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1087 on 370 degrees of freedom
## Multiple R-squared:  0.9397, Adjusted R-squared:  0.9369 
## F-statistic: 339.3 on 17 and 370 DF,  p-value: < 2.2e-16
#Least squares regression is just Ridge regression with lambda=0
ridge.mod=glmnet(x[train,], y[train], alpha=0, lambda=grid, thresh=1e-12)
lm.pred = predict(ridge.mod, s=0, newx=x[test,], exact=T)
mean((lm.pred - y.test)^2)
## [1] 1108528
#C: Ridge Regression with CV for lambda --------------------->
set.seed(1)
ridge.mod = cv.glmnet(x[train, ], y[train], alpha=0)
plot(ridge.mod, main="CV Ridge")

bestlam = ridge.mod$lambda.min
bestlam
## [1] 450.7435
ridge.pred = predict(ridge.mod, s=bestlam, newx=x[test,])
mean((ridge.pred - y.test) ^2)
## [1] 1037616
#Refit on full data and check coefs
ridge.mod2 = glmnet(x, y, alpha=0)
predict(ridge.mod2, type="coefficients", s=bestlam)[1:18,]
##   (Intercept)    PrivateYes        Accept        Enroll     Top10perc 
## -1.575123e+03 -5.312141e+02  9.443508e-01  5.084882e-01  2.395085e+01 
##     Top25perc   F.Undergrad   P.Undergrad      Outstate    Room.Board 
##  1.676068e+00  8.195201e-02  2.519290e-02 -1.804998e-02  2.008313e-01 
##         Books      Personal           PhD      Terminal     S.F.Ratio 
##  1.443091e-01 -9.669190e-03 -3.428226e+00 -4.551334e+00  1.260404e+01 
##   perc.alumni        Expend     Grad.Rate 
## -9.173762e+00  7.444322e-02  1.149378e+01
#no zero coefs as expected with ridge mod

#D: Lasso with CV for lambda -------------------------------->
lasso.mod = glmnet(x[train,], y[train], alpha=1,
                   lambda=grid)
plot(lasso.mod) #some coefs will be zero depending on lambda

set.seed(1)
cv.lasso = cv.glmnet(x[train,], y[train], alpha=1)
plot(cv.lasso, main= "CV Lasso")

#tuning param lambda
bestlam=cv.lasso$lambda.min
lasso.pred=predict(lasso.mod, s=bestlam, newx=x[test,])
mean((lasso.pred - y.test)^2)
## [1] 1032128
out=glmnet(x, y, alpha=1, lambda=grid)
lasso.coef=predict(out, type="coefficients", s=bestlam)[1:18,]
lasso.coef
##   (Intercept)    PrivateYes        Accept        Enroll     Top10perc 
## -6.321166e+02 -4.088980e+02  1.437087e+00 -1.418240e-01  3.146071e+01 
##     Top25perc   F.Undergrad   P.Undergrad      Outstate    Room.Board 
## -8.818529e-01  0.000000e+00  1.488050e-02 -5.348474e-02  1.206366e-01 
##         Books      Personal           PhD      Terminal     S.F.Ratio 
##  0.000000e+00  6.054932e-05 -5.127428e+00 -3.370371e+00  2.739664e+00 
##   perc.alumni        Expend     Grad.Rate 
## -1.038499e+00  6.839807e-02  4.706478e+00
#Both F.Undergrand and Books now 0