(#11.3&4) Ridge Regression & Lasso

The ridge regression estimate chooses the \(\beta\) that minimizes:

\[ (y-X\beta)^T(y-X\beta) + \lambda \sum_j \beta_j^2 \] or minimizes \[ (y-X\beta)^T(y-X\beta) ~~\text{subject to}~~ \sum_{j=1}^p \beta_j^2 \leq t\]

The lasso regression estimate chooses the \(\beta\) that minimizes:

\[ (y-X\beta)^T(y-X\beta) + \lambda \sum_j |\beta_j| \] or minimizes \[ (y-X\beta)^T(y-X\beta) ~~\text{subject to}~~ \sum_{j=1}^p |\beta_j| \leq t\] #### Ridge Regression

\[rmse = \sqrt{\frac{1}{N}\sum_i(y-\hat{y})} \]

data(meatspec, package="faraway")
trainmeat <- meatspec[1:172,]
testmeat <- meatspec[173:215,]

library(MASS)
rgmod <- lm.ridge(fat ~ ., trainmeat, lambda = seq(0, 5e-8, len=21))
which.min(rgmod$GCV)
## 1.75e-08 
##        8
rmse <- function(x,y) sqrt(mean((x-y)^2))

ypred <- cbind(1,as.matrix(trainmeat[,-101])) %*% coef(rgmod)[8,]
rmse(ypred, trainmeat$fat)
## [1] 0.8024395
ypred <- cbind(1,as.matrix(testmeat[,-101])) %*% coef(rgmod)[8,]
rmse(ypred, testmeat$fat)
## [1] 4.101066
cbind(ypred, testmeat$fat)
##          [,1] [,2]
## 173 47.914442 47.7
## 174 23.010584 25.5
## 175  8.827423 10.6
## 176  3.330303  2.0
## 177  5.997786  4.7
## 178  6.618782  6.8
## 179  9.984128  8.6
## 180 11.645575 11.2
## 181 13.412583 13.8
## 182 14.794074 17.7
## 183 28.337981 23.3
## 184 24.632051 29.0
## 185 11.164467 34.8
## 186 46.033985 47.8
## 187  9.811605 11.1
## 188  4.648616  2.9
## 189  9.025680  4.8
## 190  5.768760  5.6
## 191  6.324280  6.2
## 192  6.658360  6.4
## 193  5.864412  6.8
## 194  6.683017  7.1
## 195  7.690662  7.3
## 196  7.876105  7.9
## 197 10.194099  9.2
## 198  9.237660 10.1
## 199 11.767096 10.7
## 200  7.488081 11.2
## 201 11.369493 12.5
## 202 14.492691 14.3
## 203 10.977360 16.0
## 204 16.550612 17.0
## 205 16.559421 18.1
## 206 20.322161 19.4
## 207 23.231651 24.8
## 208 26.838312 27.2
## 209 29.234147 28.4
## 210 29.084097 29.2
## 211 28.904400 31.3
## 212 31.557391 33.8
## 213 37.568211 35.5
## 214 41.300225 42.5
## 215 46.033985 47.8
c(ypred[13],testmeat$fat[13])
## [1] 11.16447 34.80000
rmse(ypred[-13], testmeat$fat[-13])
## [1] 1.979462

One poor prediction can ruin your reputation.

Lasso

#install.packages("lars")
library(lars)
## Warning: 套件 'lars' 是用 R 版本 4.3.1 來建造的
## Loaded lars 1.3
data(state)
statedata <- data.frame(state.x77,row.names=state.abb)
head(statedata)
##    Population Income Illiteracy Life.Exp Murder HS.Grad Frost   Area
## AL       3615   3624        2.1    69.05   15.1    41.3    20  50708
## AK        365   6315        1.5    69.31   11.3    66.7   152 566432
## AZ       2212   4530        1.8    70.55    7.8    58.1    15 113417
## AR       2110   3378        1.9    70.66   10.1    39.9    65  51945
## CA      21198   5114        1.1    71.71   10.3    62.6    20 156361
## CO       2541   4884        0.7    72.06    6.8    63.9   166 103766
lmod <- lars(as.matrix(statedata[,-4]),statedata$Life)
plot(lmod)

The choice of t can be made using cross validation.

set.seed(123)
cvlmod <- cv.lars(as.matrix(statedata[,-4]),statedata$Life)

The horizontal axis for t has been scaled by the maximum possible value for \(\sum_{j=1}^p |\beta_j|\) given by the least squares solution.

cvlmod$index[which.min(cvlmod$cv)]
## [1] 0.6464646
predict(lmod,s=0.6464646,type="coef",mode="fraction")$coef
##    Population        Income    Illiteracy        Murder       HS.Grad 
##  2.259631e-05  0.000000e+00  0.000000e+00 -2.379447e-01  3.492634e-02 
##         Frost          Area 
## -1.558616e-03  0.000000e+00
coef(lm(Life.Exp ~ Population+Murder+HS.Grad+Frost, statedata))
##   (Intercept)    Population        Murder       HS.Grad         Frost 
##  7.102713e+01  5.013998e-05 -3.001488e-01  4.658225e-02 -5.943290e-03
trainy <- trainmeat$fat
trainx <- as.matrix(trainmeat[,-101])
lassomod <- lars(trainx,trainy)

The same variables are selected as in “AIC” criteria. However, the coefficients are shrunk somewhat from the corresponding least squares solution.