Submission info—>
Homework #3
Ben Peloquin
SUNetID: bpeloqui
Exercises submitted:
Conceptual:
Regularization section 6.8 #2 (a, b)
Applied:
Regularization section 6.8 #9 (a-d)
CONCEPTUAL
2a Lasso vs Least Squares:
F
F
T
F
Lasso will have least squares coef estimates when lambda is 0. Qualitatively, Lasso behaves similarily to Ridge Regression in that it’s advantage comes from improvements in variance with minimal (at first) cost to bias. As lambda increases the lasso becomes less flexible, decreasing variance and increasing bias. Since the qualititative behavior of Lasso and Ridge Regression are similar I talk more about this in 2b. One thing I’ll note is one nice advantage of the Lasso over Ridge Regression, which is that it produces more interpretable models because it will involve only a subset of the predictors unlike Ridge regression which will always include the full set of predictors. Taking this a step further, since the Lasso assumes that a number of the coefs are 0 (feature selection), Ridge Regression will outperform the Lasso when all the predictors are in fact related to the response, however the Lasso will outperform Ridge Regression if only a small subset of the total predictors are actually related the response. Since we may not know his a priori it could be good to run both.
2b Ridge Regression vs Least Squares:
F
F
T
F
A lambda of 0 corresponds with least squares coefficient estimates. As lambda increases, the flexibility of the ridge regression fit decreases. This corresponds with decreased variance and increased bias. The key advantage of Ridge Regression is that the shrinkage of the ridge coefficent estimates leads to a large reduction in predictor variance with only a slight increase in bias. This is helpful when we’re fitting a linear model in which n and p are fairly close are if p > n. Ridge regression addresses the variability that would result from least squares fit.
library(ISLR)
library(glmnet)
## Warning: package 'glmnet' was built under R version 3.1.3
## Loading required package: Matrix
## Loading required package: foreach
## Loaded glmnet 2.0-2
#note: Citation - Lab6 of ISLR was very helpful for this
#A: Training set and test set ---------->
set.seed(1)
names(College)
## [1] "Private" "Apps" "Accept" "Enroll" "Top10perc"
## [6] "Top25perc" "F.Undergrad" "P.Undergrad" "Outstate" "Room.Board"
## [11] "Books" "Personal" "PhD" "Terminal" "S.F.Ratio"
## [16] "perc.alumni" "Expend" "Grad.Rate"
grid = 10^seq(10, -2, length=100)
x = model.matrix(Apps~., College)[,-1]
y = College$Apps
train = sample(1:nrow(x), nrow(x)/2)
test = (-train)
y.test = y[test]
#B: linear model using least squares ----------------------->
lm = lm(y~x, subset=train)
summary(lm)
##
## Call:
## lm(formula = y ~ x, subset = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5276.1 -473.2 -63.9 351.9 6574.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 78.15204 600.84427 0.130 0.896581
## xPrivateYes -757.22843 205.47577 -3.685 0.000263 ***
## xAccept 1.67981 0.05196 32.329 < 2e-16 ***
## xEnroll -0.62380 0.27629 -2.258 0.024544 *
## xTop10perc 67.45654 8.45231 7.981 1.84e-14 ***
## xTop25perc -22.37500 6.57093 -3.405 0.000734 ***
## xF.Undergrad -0.06126 0.05468 -1.120 0.263258
## xP.Undergrad 0.04745 0.06248 0.760 0.448024
## xOutstate -0.09227 0.02889 -3.194 0.001524 **
## xRoom.Board 0.24513 0.07300 3.358 0.000867 ***
## xBooks 0.09086 0.36826 0.247 0.805254
## xPersonal 0.05886 0.09260 0.636 0.525455
## xPhD -8.89027 7.20890 -1.233 0.218271
## xTerminal -1.71947 8.22589 -0.209 0.834539
## xS.F.Ratio -5.75201 21.32871 -0.270 0.787554
## xperc.alumni -1.46681 6.28702 -0.233 0.815652
## xExpend 0.03487 0.01928 1.808 0.071361 .
## xGrad.Rate 7.57567 4.69602 1.613 0.107551
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1087 on 370 degrees of freedom
## Multiple R-squared: 0.9397, Adjusted R-squared: 0.9369
## F-statistic: 339.3 on 17 and 370 DF, p-value: < 2.2e-16
#Least squares regression is just Ridge regression with lambda=0
ridge.mod=glmnet(x[train,], y[train], alpha=0, lambda=grid, thresh=1e-12)
lm.pred = predict(ridge.mod, s=0, newx=x[test,], exact=T)
mean((lm.pred - y.test)^2)
## [1] 1108528
#C: Ridge Regression with CV for lambda --------------------->
set.seed(1)
ridge.mod = cv.glmnet(x[train, ], y[train], alpha=0)
plot(ridge.mod, main="CV Ridge")
bestlam = ridge.mod$lambda.min
bestlam
## [1] 450.7435
ridge.pred = predict(ridge.mod, s=bestlam, newx=x[test,])
mean((ridge.pred - y.test) ^2)
## [1] 1037616
#Refit on full data and check coefs
ridge.mod2 = glmnet(x, y, alpha=0)
predict(ridge.mod2, type="coefficients", s=bestlam)[1:18,]
## (Intercept) PrivateYes Accept Enroll Top10perc
## -1.575123e+03 -5.312141e+02 9.443508e-01 5.084882e-01 2.395085e+01
## Top25perc F.Undergrad P.Undergrad Outstate Room.Board
## 1.676068e+00 8.195201e-02 2.519290e-02 -1.804998e-02 2.008313e-01
## Books Personal PhD Terminal S.F.Ratio
## 1.443091e-01 -9.669190e-03 -3.428226e+00 -4.551334e+00 1.260404e+01
## perc.alumni Expend Grad.Rate
## -9.173762e+00 7.444322e-02 1.149378e+01
#no zero coefs as expected with ridge mod
#D: Lasso with CV for lambda -------------------------------->
lasso.mod = glmnet(x[train,], y[train], alpha=1,
lambda=grid)
plot(lasso.mod) #some coefs will be zero depending on lambda
set.seed(1)
cv.lasso = cv.glmnet(x[train,], y[train], alpha=1)
plot(cv.lasso, main= "CV Lasso")
#tuning param lambda
bestlam=cv.lasso$lambda.min
lasso.pred=predict(lasso.mod, s=bestlam, newx=x[test,])
mean((lasso.pred - y.test)^2)
## [1] 1032128
out=glmnet(x, y, alpha=1, lambda=grid)
lasso.coef=predict(out, type="coefficients", s=bestlam)[1:18,]
lasso.coef
## (Intercept) PrivateYes Accept Enroll Top10perc
## -6.321166e+02 -4.088980e+02 1.437087e+00 -1.418240e-01 3.146071e+01
## Top25perc F.Undergrad P.Undergrad Outstate Room.Board
## -8.818529e-01 0.000000e+00 1.488050e-02 -5.348474e-02 1.206366e-01
## Books Personal PhD Terminal S.F.Ratio
## 0.000000e+00 6.054932e-05 -5.127428e+00 -3.370371e+00 2.739664e+00
## perc.alumni Expend Grad.Rate
## -1.038499e+00 6.839807e-02 4.706478e+00
#Both F.Undergrand and Books now 0