Variable Selection in GLM

Antonio Montalvo, Chris Huszar, Cyrus Safaie

March 9, 2016

Why variable selection?

Variable selection is intended to select the “best” subset of predictors

Models for Wide Data

Wide data introduce a level of complexity that most algorithms for variables selection cannot handle due to computational cumbersomeness or space/dimension limitations.

Algorithms and models for selection

Regularization/Shrinkage

Regularization

Process to Achieve optimum Model Complexity

Ridge Regression

alt text

Lasso Regression

alt text

Ridge vs. Lasso visualization

The limitations of Lasso

Glmnet Package

Extremely efficient procedures for fitting the entire lasso or elastic-net regularization path for linear regression, logistic and multinomial regression models, Poisson regression and the Cox model.

Authors:

Elastic Net

Elastic Net

Regularization Path Trace

Elastic Net Advantages:

Beyond Theory: Example

Let’s first simulate the data. 4 variables and output count data (generated from Poisson distribution).

set.seed(60134)
N <- 100 # number of exposures
n <- 10 # size of each exposure
n <- matrix(rep(n,N),ncol=1) # size of each exposure (vector)
k <- 4 # number of covariates
b <- c(0.5,0.10,0,-0.2) # value of covariate parameter vector
# simulated covariates
X <- matrix(c(rep(1,N),rnorm(N,0,1),rnorm(N,0,1),rnorm(N,0,1))
            ,nrow=N,ncol=k)
colnames(X)<-c("True_X1","True_X2","True_X3","True_X4")
theta <- exp(X %*% b)
mu <- n * theta
y <- matrix(-99,nrow=N)
for (i in 1:N){y[i] <- rpois(1,mu[i])}

Example (continued)

Now we add 10 random variables (covariates)

k2=10
XX <- matrix(c(rnorm(N,0,1),rnorm(N,0,1),rnorm(N,0,1),rnorm(N,0,1),
               rnorm(N,0,1),rnorm(N,0,1),rnorm(N,0,1),rnorm(N,0,1),
               rnorm(N,0,1),rnorm(N,0,1)),nrow=N,ncol=k2)

colnames(XX)<-c("X1","X2","X3","X4","X5","X6","X7","X8","X9","X10")

And three correlated and multicollinear features.

x_cor2<-X[,2]*1.5
x_cor3<-X[,3]/2
x_cor_2and4<-X[,2]*X[,4]
x_cor<-cbind(x_cor2,x_cor3,x_cor_2and4)
XXX<-cbind(X,XX,x_cor)
colnames(XXX)=c("True_X1","True_X2","True_X3","True_X4","X1","X2","X3","X4","X5","X6","X7","X8","X9","X10","x_cor2","x_cor3","x_cor_2and4")

Example (continued)

Here is the final simulated data

yX <- data.frame(n,y,XXX)
names(yX)
##  [1] "n"           "y"           "True_X1"     "True_X2"     "True_X3"    
##  [6] "True_X4"     "X1"          "X2"          "X3"          "X4"         
## [11] "X5"          "X6"          "X7"          "X8"          "X9"         
## [16] "X10"         "x_cor2"      "x_cor3"      "x_cor_2and4"

Example: Lasso

alpha=1 in glmnet elastic net return lasso. Coefficients converge to zero in lasso, so it could be used for variables selection

model.lasso<-glmnet(XXX, y, family="poisson", offset=log(n), alpha=1, nlambda = 100)
plot(model.lasso,label=T)

Example: Ridge

alpha=0 in glmnet elastic net return ridge. As expected coefficients are non-zero in ridge

model.ridge<-glmnet(XXX[,-1], y, family="poisson", offset=log(n), alpha=0, nlambda = 100)
plot(model.ridge,label=T)

Example: Lasso with cross-validation

glmnet fit 100 models with different lambda’s. Cross validation could be used to find the optimum lambda.

set.seed(60054)
cv.model.lasso<-cv.glmnet(XXX[,-1], y, family="poisson", offset=log(n), alpha=1, nfolds=7) 
plot(cv.model.lasso)

Example: Lasso with cross-validation (continued)

glmnet fit 100 models with different lambda’s. Cross validation could be used to find the optimum lambda.

attributes(cv.model.lasso)
## $names
##  [1] "lambda"     "cvm"        "cvsd"       "cvup"       "cvlo"      
##  [6] "nzero"      "name"       "glmnet.fit" "lambda.min" "lambda.1se"
## 
## $class
## [1] "cv.glmnet"
opt.lam = c(cv.model.lasso$lambda.min, cv.model.lasso$lambda.1se)
opt.lam
## [1] 0.2934462 1.1846473

Example: Lasso with cross-validation (continued)

# b <- c(0.5,0.10,0,-0.2) = True coefficients
coef(cv.model.lasso, s=opt.lam)
## 17 x 2 sparse Matrix of class "dgCMatrix"
##                         1             2
## (Intercept)  0.5375460324  5.474378e-01
## True_X2      0.0637800000  5.026494e-03
## True_X3      .             .           
## True_X4     -0.2037099191 -1.612753e-01
## X1           .             .           
## X2          -0.0187948758  .           
## X3          -0.0242132135  .           
## X4           .             .           
## X5           0.0072053663  .           
## X6           .             .           
## X7           .             .           
## X8          -0.0098765465  .           
## X9           0.0033926210  .           
## X10         -0.0304074434  .           
## x_cor2       0.0006425355  1.572106e-07
## x_cor3       .             .           
## x_cor_2and4 -0.0126961053  .

glm on the selected features from lasso

## 
## Call:  glm(formula = y ~ offset(log(n)) + True_X2 + True_X4 + x_cor2, 
##     family = poisson(link = "log"), data = yX)
## 
## Coefficients:
## (Intercept)      True_X2      True_X4       x_cor2  
##     0.53091      0.08018     -0.22706           NA  
## 
## Degrees of Freedom: 99 Total (i.e. Null);  97 Residual
## Null Deviance:       209.7 
## Residual Deviance: 110.2     AIC: 579

Example: Ridge with cross-validation

glmnet fit 100 models with different lambda’s. Cross validation could be used to find the optimum lambda.

set.seed(60054)
cv.model.ridge<-cv.glmnet(XXX[,-1], y, family="poisson", offset=log(n), alpha=0, nfolds=7) 
opt.lam1 = c(cv.model.ridge$lambda.min, cv.model.ridge$lambda.1se)
plot(cv.model.ridge)

Example: Ridge with cross-validation (continued)

coef(cv.model.ridge, s=opt.lam1)
## 17 x 2 sparse Matrix of class "dgCMatrix"
##                         1             2
## (Intercept)  0.5383173632  0.5438883866
## True_X2      0.0405253270  0.0287478116
## True_X3     -0.0019828581 -0.0007945813
## True_X4     -0.1802579873 -0.1071036523
## X1          -0.0059800394 -0.0037653027
## X2          -0.0327161959 -0.0250856432
## X3          -0.0337986020 -0.0198842119
## X4          -0.0076619084 -0.0046624991
## X5           0.0155441789  0.0032589406
## X6          -0.0026323367  0.0035591116
## X7           0.0009740535  0.0036545662
## X8          -0.0257743786 -0.0078117871
## X9           0.0166185167  0.0090307820
## X10         -0.0426544236 -0.0261060553
## x_cor2       0.0269529154  0.0191656430
## x_cor3      -0.0039940405 -0.0015890217
## x_cor_2and4 -0.0338305946 -0.0307988999

Example: Elastic-net with cross-validation

α could also be optimized by k-cross validation. The R package caret apparently can build models using the glmnet package and should be set up to tune over both parameters α and λ.

set.seed(60054)
cv.model.alpha05<-cv.glmnet(XXX[,-1], y, family="poisson", offset=log(n), alpha=.5, nfolds=7) 
opt.lam2 = c(cv.model.alpha05$lambda.min, cv.model.alpha05$lambda.1se)
opt.lam2
## [1] 0.5347545 2.1588130

Example: Elastic-net with cross-validation

coef(cv.model.alpha05, s=opt.lam2)
## 17 x 2 sparse Matrix of class "dgCMatrix"
##                        1            2
## (Intercept)  0.537656862  0.546786444
## True_X2      0.035583782  0.006182489
## True_X3      .            .          
## True_X4     -0.201515129 -0.157592867
## X1           .            .          
## X2          -0.020212992  .          
## X3          -0.025215212  .          
## X4           .            .          
## X5           0.008129189  .          
## X6           .            .          
## X7           .            .          
## X8          -0.011556780  .          
## X9           0.004730270  .          
## X10         -0.031839364  .          
## x_cor2       0.020623849  0.003518974
## x_cor3       .            .          
## x_cor_2and4 -0.015055044  .

References:

References: