Antonio Montalvo, Chris Huszar, Cyrus Safaie
March 9, 2016
Variable selection is intended to select the “best” subset of predictors
Wide data introduce a level of complexity that most algorithms for variables selection cannot handle due to computational cumbersomeness or space/dimension limitations.
Process to Achieve optimum Model Complexity
Extremely efficient procedures for fitting the entire lasso or elastic-net regularization path for linear regression, logistic and multinomial regression models, Poisson regression and the Cox model.
Authors:
Let’s first simulate the data. 4 variables and output count data (generated from Poisson distribution).
set.seed(60134)
N <- 100 # number of exposures
n <- 10 # size of each exposure
n <- matrix(rep(n,N),ncol=1) # size of each exposure (vector)
k <- 4 # number of covariates
b <- c(0.5,0.10,0,-0.2) # value of covariate parameter vector
# simulated covariates
X <- matrix(c(rep(1,N),rnorm(N,0,1),rnorm(N,0,1),rnorm(N,0,1))
,nrow=N,ncol=k)
colnames(X)<-c("True_X1","True_X2","True_X3","True_X4")
theta <- exp(X %*% b)
mu <- n * theta
y <- matrix(-99,nrow=N)
for (i in 1:N){y[i] <- rpois(1,mu[i])}Now we add 10 random variables (covariates)
k2=10
XX <- matrix(c(rnorm(N,0,1),rnorm(N,0,1),rnorm(N,0,1),rnorm(N,0,1),
rnorm(N,0,1),rnorm(N,0,1),rnorm(N,0,1),rnorm(N,0,1),
rnorm(N,0,1),rnorm(N,0,1)),nrow=N,ncol=k2)
colnames(XX)<-c("X1","X2","X3","X4","X5","X6","X7","X8","X9","X10")And three correlated and multicollinear features.
x_cor2<-X[,2]*1.5
x_cor3<-X[,3]/2
x_cor_2and4<-X[,2]*X[,4]
x_cor<-cbind(x_cor2,x_cor3,x_cor_2and4)
XXX<-cbind(X,XX,x_cor)
colnames(XXX)=c("True_X1","True_X2","True_X3","True_X4","X1","X2","X3","X4","X5","X6","X7","X8","X9","X10","x_cor2","x_cor3","x_cor_2and4")Here is the final simulated data
yX <- data.frame(n,y,XXX)
names(yX)## [1] "n" "y" "True_X1" "True_X2" "True_X3"
## [6] "True_X4" "X1" "X2" "X3" "X4"
## [11] "X5" "X6" "X7" "X8" "X9"
## [16] "X10" "x_cor2" "x_cor3" "x_cor_2and4"
alpha=1 in glmnet elastic net return lasso. Coefficients converge to zero in lasso, so it could be used for variables selection
model.lasso<-glmnet(XXX, y, family="poisson", offset=log(n), alpha=1, nlambda = 100)plot(model.lasso,label=T)alpha=0 in glmnet elastic net return ridge. As expected coefficients are non-zero in ridge
model.ridge<-glmnet(XXX[,-1], y, family="poisson", offset=log(n), alpha=0, nlambda = 100)plot(model.ridge,label=T)glmnet fit 100 models with different lambda’s. Cross validation could be used to find the optimum lambda.
set.seed(60054)
cv.model.lasso<-cv.glmnet(XXX[,-1], y, family="poisson", offset=log(n), alpha=1, nfolds=7) plot(cv.model.lasso)glmnet fit 100 models with different lambda’s. Cross validation could be used to find the optimum lambda.
attributes(cv.model.lasso)## $names
## [1] "lambda" "cvm" "cvsd" "cvup" "cvlo"
## [6] "nzero" "name" "glmnet.fit" "lambda.min" "lambda.1se"
##
## $class
## [1] "cv.glmnet"
opt.lam = c(cv.model.lasso$lambda.min, cv.model.lasso$lambda.1se)
opt.lam## [1] 0.2934462 1.1846473
# b <- c(0.5,0.10,0,-0.2) = True coefficients
coef(cv.model.lasso, s=opt.lam)## 17 x 2 sparse Matrix of class "dgCMatrix"
## 1 2
## (Intercept) 0.5375460324 5.474378e-01
## True_X2 0.0637800000 5.026494e-03
## True_X3 . .
## True_X4 -0.2037099191 -1.612753e-01
## X1 . .
## X2 -0.0187948758 .
## X3 -0.0242132135 .
## X4 . .
## X5 0.0072053663 .
## X6 . .
## X7 . .
## X8 -0.0098765465 .
## X9 0.0033926210 .
## X10 -0.0304074434 .
## x_cor2 0.0006425355 1.572106e-07
## x_cor3 . .
## x_cor_2and4 -0.0126961053 .
##
## Call: glm(formula = y ~ offset(log(n)) + True_X2 + True_X4 + x_cor2,
## family = poisson(link = "log"), data = yX)
##
## Coefficients:
## (Intercept) True_X2 True_X4 x_cor2
## 0.53091 0.08018 -0.22706 NA
##
## Degrees of Freedom: 99 Total (i.e. Null); 97 Residual
## Null Deviance: 209.7
## Residual Deviance: 110.2 AIC: 579
glmnet fit 100 models with different lambda’s. Cross validation could be used to find the optimum lambda.
set.seed(60054)
cv.model.ridge<-cv.glmnet(XXX[,-1], y, family="poisson", offset=log(n), alpha=0, nfolds=7)
opt.lam1 = c(cv.model.ridge$lambda.min, cv.model.ridge$lambda.1se)plot(cv.model.ridge)coef(cv.model.ridge, s=opt.lam1)## 17 x 2 sparse Matrix of class "dgCMatrix"
## 1 2
## (Intercept) 0.5383173632 0.5438883866
## True_X2 0.0405253270 0.0287478116
## True_X3 -0.0019828581 -0.0007945813
## True_X4 -0.1802579873 -0.1071036523
## X1 -0.0059800394 -0.0037653027
## X2 -0.0327161959 -0.0250856432
## X3 -0.0337986020 -0.0198842119
## X4 -0.0076619084 -0.0046624991
## X5 0.0155441789 0.0032589406
## X6 -0.0026323367 0.0035591116
## X7 0.0009740535 0.0036545662
## X8 -0.0257743786 -0.0078117871
## X9 0.0166185167 0.0090307820
## X10 -0.0426544236 -0.0261060553
## x_cor2 0.0269529154 0.0191656430
## x_cor3 -0.0039940405 -0.0015890217
## x_cor_2and4 -0.0338305946 -0.0307988999
α could also be optimized by k-cross validation. The R package caret apparently can build models using the glmnet package and should be set up to tune over both parameters α and λ.
set.seed(60054)
cv.model.alpha05<-cv.glmnet(XXX[,-1], y, family="poisson", offset=log(n), alpha=.5, nfolds=7)
opt.lam2 = c(cv.model.alpha05$lambda.min, cv.model.alpha05$lambda.1se)
opt.lam2## [1] 0.5347545 2.1588130
coef(cv.model.alpha05, s=opt.lam2)## 17 x 2 sparse Matrix of class "dgCMatrix"
## 1 2
## (Intercept) 0.537656862 0.546786444
## True_X2 0.035583782 0.006182489
## True_X3 . .
## True_X4 -0.201515129 -0.157592867
## X1 . .
## X2 -0.020212992 .
## X3 -0.025215212 .
## X4 . .
## X5 0.008129189 .
## X6 . .
## X7 . .
## X8 -0.011556780 .
## X9 0.004730270 .
## X10 -0.031839364 .
## x_cor2 0.020623849 0.003518974
## x_cor3 . .
## x_cor_2and4 -0.015055044 .