The ridge regression estimate chooses the \(\beta\) that minimizes:
\[ (y-X\beta)^T(y-X\beta) + \lambda \sum_j \beta_j^2 \] or minimizes \[ (y-X\beta)^T(y-X\beta) ~~\text{subject to}~~ \sum_{j=1}^p \beta_j^2 \leq t\]
The lasso regression estimate chooses the \(\beta\) that minimizes:
\[ (y-X\beta)^T(y-X\beta) + \lambda \sum_j |\beta_j| \] or minimizes \[ (y-X\beta)^T(y-X\beta) ~~\text{subject to}~~ \sum_{j=1}^p |\beta_j| \leq t\] #### Ridge Regression
\[rmse = \sqrt{\frac{1}{N}\sum_i(y-\hat{y})} \]
data(meatspec, package="faraway")
trainmeat <- meatspec[1:172,]
testmeat <- meatspec[173:215,]
library(MASS)
rgmod <- lm.ridge(fat ~ ., trainmeat, lambda = seq(0, 5e-8, len=21))
which.min(rgmod$GCV)
## 1.75e-08
## 8
rmse <- function(x,y) sqrt(mean((x-y)^2))
ypred <- cbind(1,as.matrix(trainmeat[,-101])) %*% coef(rgmod)[8,]
rmse(ypred, trainmeat$fat)
## [1] 0.8024395
ypred <- cbind(1,as.matrix(testmeat[,-101])) %*% coef(rgmod)[8,]
rmse(ypred, testmeat$fat)
## [1] 4.101066
cbind(ypred, testmeat$fat)
## [,1] [,2]
## 173 47.914442 47.7
## 174 23.010584 25.5
## 175 8.827423 10.6
## 176 3.330303 2.0
## 177 5.997786 4.7
## 178 6.618782 6.8
## 179 9.984128 8.6
## 180 11.645575 11.2
## 181 13.412583 13.8
## 182 14.794074 17.7
## 183 28.337981 23.3
## 184 24.632051 29.0
## 185 11.164467 34.8
## 186 46.033985 47.8
## 187 9.811605 11.1
## 188 4.648616 2.9
## 189 9.025680 4.8
## 190 5.768760 5.6
## 191 6.324280 6.2
## 192 6.658360 6.4
## 193 5.864412 6.8
## 194 6.683017 7.1
## 195 7.690662 7.3
## 196 7.876105 7.9
## 197 10.194099 9.2
## 198 9.237660 10.1
## 199 11.767096 10.7
## 200 7.488081 11.2
## 201 11.369493 12.5
## 202 14.492691 14.3
## 203 10.977360 16.0
## 204 16.550612 17.0
## 205 16.559421 18.1
## 206 20.322161 19.4
## 207 23.231651 24.8
## 208 26.838312 27.2
## 209 29.234147 28.4
## 210 29.084097 29.2
## 211 28.904400 31.3
## 212 31.557391 33.8
## 213 37.568211 35.5
## 214 41.300225 42.5
## 215 46.033985 47.8
c(ypred[13],testmeat$fat[13])
## [1] 11.16447 34.80000
rmse(ypred[-13], testmeat$fat[-13])
## [1] 1.979462
One poor prediction can ruin your reputation.
#install.packages("lars")
library(lars)
## Warning: 套件 'lars' 是用 R 版本 4.3.1 來建造的
## Loaded lars 1.3
data(state)
statedata <- data.frame(state.x77,row.names=state.abb)
head(statedata)
## Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
## AL 3615 3624 2.1 69.05 15.1 41.3 20 50708
## AK 365 6315 1.5 69.31 11.3 66.7 152 566432
## AZ 2212 4530 1.8 70.55 7.8 58.1 15 113417
## AR 2110 3378 1.9 70.66 10.1 39.9 65 51945
## CA 21198 5114 1.1 71.71 10.3 62.6 20 156361
## CO 2541 4884 0.7 72.06 6.8 63.9 166 103766
lmod <- lars(as.matrix(statedata[,-4]),statedata$Life)
plot(lmod)
The choice of t can be made using cross validation.
set.seed(123)
cvlmod <- cv.lars(as.matrix(statedata[,-4]),statedata$Life)
The horizontal axis for t has been scaled by the maximum possible value for \(\sum_{j=1}^p |\beta_j|\) given by the least squares solution.
cvlmod$index[which.min(cvlmod$cv)]
## [1] 0.6464646
predict(lmod,s=0.6464646,type="coef",mode="fraction")$coef
## Population Income Illiteracy Murder HS.Grad
## 2.259631e-05 0.000000e+00 0.000000e+00 -2.379447e-01 3.492634e-02
## Frost Area
## -1.558616e-03 0.000000e+00
coef(lm(Life.Exp ~ Population+Murder+HS.Grad+Frost, statedata))
## (Intercept) Population Murder HS.Grad Frost
## 7.102713e+01 5.013998e-05 -3.001488e-01 4.658225e-02 -5.943290e-03
trainy <- trainmeat$fat
trainx <- as.matrix(trainmeat[,-101])
lassomod <- lars(trainx,trainy)
The same variables are selected as in “AIC” criteria. However, the coefficients are shrunk somewhat from the corresponding least squares solution.