We will use the model.matrix to create the argument x for glmnet later. One reason to use model.matrix (instead of, say, as.matrix) is that it automatically expands factors to a set of dummy variables. Run ?model.matrix to learn more about this function.
x <-model.matrix(Salary ~ ., Hitters)[, -1]y <- Hitters$Salaryy.test <- y[test]
We will use the glmnet package to perform ridge regression and the lasso. Recall that these two methods use \(\ell^2\) loss and \(\ell^1\) loss, respectively, for the positive betas.
library(glmnet)
We will use the glmnet function, which can be used to fit both ridge regression and lasso models.
The function glmnet has slightly different syntax from the lm/glm functions that we have encountered:
We must pass in an x matrix as well as a y vector
We do not use the y ~ x syntax
The syntax of the glmnet function is as follows:
glmnet(x, y, alpha, # alpha=0 the ridge penalty.lambda = lambda)
The argument alpha=1 is the lasso penalty, and alpha=0 the ridge penalty.
For different \(\lambda\), the estimated coefficients tend to differ. Specifically, as \(\lambda\) gets higher, the coefficient estimates should get smaller (in terms of \(\ell^2\) norm). Below we check whether that’s true for \(\lambda =1\) and \(\lambda = 10000\):
coef1 =coef(glmnet(x, y, alpha =0, lambda =1))coef2 =coef(glmnet(x, y, alpha =0, lambda =10000))c(sum(coef1^2), sum(coef2^2))
[1] 45021.75 154289.97
We use cross validation to determine the optimal \(\lambda\). We can do this using the cross-validation function, cv.glmnet. By default, the function performs ten-fold cross-validation.
lasso.mod <-glmnet( x, y, alpha =1,lambda = cv.out$lambda.min)lasso.pred <-predict( lasso.mod, s = cv.out$lambda.min,newx = x[test, ])mse_lasso =mean((lasso.pred - y.test)^2)
c(mse_lm, mse_ridge, mse_lasso)
[1] 168593.3 119150.1 112193.9
The prediction accuracy of LASSO is similar to that of Ridge. However, the lasso has a substantial advantage over ridge regression in that the resulting coefficient estimates are sparse.
Here we see that 10 of the 19 coefficient estimates are exactly zero. So the lasso model with \(\lambda\) chosen by cross-validation contains only nine variables: