Simple Example

In statistics and machine learning, lasso (least absolute shrinkage and selection operator; also Lasso or LASSO) is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces.

Glmnet is a package that fits a generalized linear model via penalized maximum likelihood. Let us create a simple linear regression model with the mtcars dataset to demonstrate the use of glmnet. The authors of glmnet are Jerome Friedman, Trevor Hastie, Rob Tibshirani and Noah Simon, and the R package is maintained by Trevor Hastie. The matlab version of glmnet is maintained by Junyang Qian.

It is known that the ridge penalty shrinks the coefficients of correlated predictors towards each other while the lasso tends to pick one of them and discard the others. The elastic-net penalty mixes these two; if predictors are correlated in groups, an alpha=0.5 tends to select the groups in or out together. This is a higher level parameter, and users might pick a value upfront, else experiment with a few different values. One use of alpha is for numerical stability; for example, the elastic net with alpha =1 - epsilon for some small epsilon>0 performs much like the lasso, but removes any degeneracies and wild behavior caused by extreme correlations.

The glmnet algorithms use cyclical coordinate descent, which successively optimizes the objective function over each parameter with others fixed, and cycles repeatedly until convergence.

library(glmnet)
## Loading required package: Matrix
## Loading required package: foreach
## Loaded glmnet 2.0-16
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

Input matrix x and a response “vector” y

y <- as.vector(mtcars[,1])
x <- as.matrix(mtcars[,-1])

Fit function

“fit??? is an object of class glmnet that contains all the relevant information of the fitted model for further use.

fit = glmnet(x, y)

Plot function

various methods are provided for the object such as plot, print, coef and predict that enable us to execute those tasks.

plot(fit,label = TRUE)

L1-norm least absolute errors

Each curve corresponds to a variable. It shows the path of its coefficient against the ???1-norm of the whole coefficient vector at as ?? varies. The axis above indicates the number of nonzero coefficients at the current ??, which is the effective degrees of freedom (df) for the lasso.

L1-norm is also known as least absolute deviations (LAD), least absolute errors (LAE). It is basically minimizing the sum of the absolute differences (S) between the target value (Yi) and the estimated values (f(xi))

print(fit)
## 
## Call:  glmnet(x = x, y = y) 
## 
##       Df   %Dev   Lambda
##  [1,]  0 0.0000 5.147000
##  [2,]  2 0.1290 4.690000
##  [3,]  2 0.2481 4.273000
##  [4,]  2 0.3469 3.894000
##  [5,]  2 0.4290 3.548000
##  [6,]  2 0.4971 3.232000
##  [7,]  2 0.5537 2.945000
##  [8,]  2 0.6006 2.684000
##  [9,]  2 0.6396 2.445000
## [10,]  3 0.6726 2.228000
## [11,]  3 0.7015 2.030000
## [12,]  3 0.7256 1.850000
## [13,]  3 0.7455 1.685000
## [14,]  3 0.7621 1.536000
## [15,]  3 0.7759 1.399000
## [16,]  3 0.7873 1.275000
## [17,]  3 0.7968 1.162000
## [18,]  3 0.8046 1.058000
## [19,]  3 0.8112 0.964500
## [20,]  3 0.8166 0.878800
## [21,]  3 0.8211 0.800700
## [22,]  3 0.8249 0.729600
## [23,]  4 0.8281 0.664800
## [24,]  5 0.8320 0.605700
## [25,]  5 0.8360 0.551900
## [26,]  6 0.8396 0.502900
## [27,]  6 0.8426 0.458200
## [28,]  6 0.8451 0.417500
## [29,]  6 0.8472 0.380400
## [30,]  8 0.8489 0.346600
## [31,]  8 0.8514 0.315800
## [32,]  8 0.8535 0.287800
## [33,]  8 0.8553 0.262200
## [34,]  8 0.8568 0.238900
## [35,]  8 0.8580 0.217700
## [36,]  8 0.8590 0.198300
## [37,]  8 0.8598 0.180700
## [38,]  9 0.8606 0.164700
## [39,]  9 0.8615 0.150000
## [40,]  9 0.8622 0.136700
## [41,]  9 0.8627 0.124600
## [42,]  9 0.8632 0.113500
## [43,]  9 0.8636 0.103400
## [44,]  9 0.8639 0.094230
## [45,]  9 0.8642 0.085860
## [46,]  9 0.8644 0.078230
## [47,]  9 0.8646 0.071280
## [48,]  9 0.8648 0.064950
## [49,]  9 0.8649 0.059180
## [50,]  9 0.8650 0.053920
## [51,]  9 0.8651 0.049130
## [52,]  9 0.8652 0.044770
## [53,]  9 0.8652 0.040790
## [54,] 10 0.8654 0.037170
## [55,] 10 0.8660 0.033860
## [56,] 10 0.8665 0.030860
## [57,] 10 0.8669 0.028110
## [58,] 10 0.8673 0.025620
## [59,] 10 0.8676 0.023340
## [60,] 10 0.8678 0.021270
## [61,] 10 0.8680 0.019380
## [62,] 10 0.8682 0.017660
## [63,] 10 0.8683 0.016090
## [64,] 10 0.8684 0.014660
## [65,] 10 0.8685 0.013360
## [66,] 10 0.8686 0.012170
## [67,] 10 0.8687 0.011090
## [68,] 10 0.8687 0.010100
## [69,] 10 0.8688 0.009206
## [70,] 10 0.8688 0.008388
## [71,] 10 0.8688 0.007643
## [72,] 10 0.8689 0.006964
## [73,] 10 0.8689 0.006345
## [74,] 10 0.8689 0.005782
## [75,] 10 0.8689 0.005268
## [76,] 10 0.8689 0.004800
## [77,] 10 0.8689 0.004374
## [78,] 10 0.8690 0.003985
## [79,] 10 0.8690 0.003631

Non-zero coefficients

It shows from left to right the number of nonzero coefficients (Df), the percent (of null) deviance explained (%dev) and the value of ?? (Lambda).

You may specify the lambda and find the coefficients

coef(fit,s=0.1)
## 11 x 1 sparse Matrix of class "dgCMatrix"
##                       1
## (Intercept) 20.12070307
## cyl         -0.21987003
## disp         .         
## hp          -0.01300595
## drat         0.77162507
## wt          -2.63787681
## qsec         0.46074875
## vs           0.11747113
## am           2.11349978
## gear         0.30437026
## carb        -0.46452172

Multiresponse Gaussian Example

Obviously, as the name suggests, y is not a vector, but a matrix of quantitative responses in this section. The coefficients at each value of lambda are also a matrix as a result. The multiresponse Gaussian family is obtained using family = “mgaussian” option in glmnet. It is very similar to the single-response case.

This is useful when there are a number of (correlated) responses - the so-called ???multi-task learning??? problem. Here the sharing involves which variables are selected, since when a variable is selected, a coefficient is fit for each response. Most of the options are the same, so we focus here on the differences with the single response model.

Please load the example dataset into your current worksapce before you load it https://github.com/cran/glmnet/blob/master/data/MultiGaussianExample.RData.

load("MultiGaussianExample.RData")
boxplot(y)

boxplot(x)

mfit = glmnet(x, y, family = "mgaussian")
plot(mfit, xvar = "lambda", label = TRUE, type.coef = "2norm")

predict(mfit, newx = x[1:5,], s = c(0.1, 0.01))
## , , 1
## 
##              y1         y2         y3       y4
## [1,] -4.7106263 -1.1634574  0.6027634 3.740989
## [2,]  4.1301735 -3.0507968 -1.2122630 4.970141
## [3,]  3.1595229 -0.5759621  0.2607981 2.053976
## [4,]  0.6459242  2.1205605 -0.2252050 3.146286
## [5,] -1.1791890  0.1056262 -7.3352965 3.248370
## 
## , , 2
## 
##              y1         y2         y3       y4
## [1,] -4.6415158 -1.2290282  0.6118289 3.779521
## [2,]  4.4712843 -3.2529658 -1.2572583 5.266039
## [3,]  3.4735228 -0.6929231  0.4684037 2.055574
## [4,]  0.7353311  2.2965083 -0.2190297 2.989371
## [5,] -1.2759930  0.2892536 -7.8259206 3.205211
cvmfit = cv.glmnet(x, y, family = "mgaussian")
cvmfit$lambda.min
## [1] 0.03261454
cvmfit$lambda.1se
## [1] 0.1316655

Breaset Cancer Wisconsin Data

You use random dataset and do the multivariate regression

df<- read.csv(url("https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data"))

str(df)
## 'data.frame':    698 obs. of  11 variables:
##  $ X1000025: int  1002945 1015425 1016277 1017023 1017122 1018099 1018561 1033078 1033078 1035283 ...
##  $ X5      : int  5 3 6 4 8 1 2 2 4 1 ...
##  $ X1      : int  4 1 8 1 10 1 1 1 2 1 ...
##  $ X1.1    : int  4 1 8 1 10 1 2 1 1 1 ...
##  $ X1.2    : int  5 1 1 3 8 1 1 1 1 1 ...
##  $ X2      : int  7 2 3 2 7 2 2 2 2 1 ...
##  $ X1.3    : Factor w/ 11 levels "?","1","10","2",..: 3 4 6 2 3 3 2 2 2 2 ...
##  $ X3      : int  3 3 3 3 9 3 3 1 2 3 ...
##  $ X1.4    : int  2 1 7 1 7 1 1 1 1 1 ...
##  $ X1.5    : int  1 1 1 1 1 1 1 5 1 1 ...
##  $ X2.1    : int  2 2 2 2 4 2 2 2 2 2 ...
df <- df[complete.cases(df),]
df <- apply(df,2,as.numeric) 
## Warning in apply(df, 2, as.numeric): NAs introduced by coercion
df <- na.omit(df)

input matrix x and a response “matrix” y for mgaussian

Key: remove the NAs and convert to numeric before apply glmnet

y1 <- as.matrix(df[,2:3])
x1 <- as.matrix(df[,c(4:5,7:11)])
mfit = glmnet(x1, y1, family = "mgaussian")
plot(mfit, xvar = "lambda", label = TRUE, type.coef = "2norm")

predict(mfit, newx = x1[1:5,], s = c(0.1, 0.01))
## , , 1
## 
##            X5       X1
## [1,] 3.546993 3.221813
## [2,] 2.949266 1.143918
## [3,] 4.579158 5.552785
## [4,] 2.903467 1.259485
## [5,] 7.778365 8.988923
## 
## , , 2
## 
##            X5       X1
## [1,] 3.336877 3.069706
## [2,] 2.909305 1.096829
## [3,] 4.396941 5.742645
## [4,] 2.729214 1.311402
## [5,] 7.605602 9.224665
cvmfit = cv.glmnet(x1, y1, family = "mgaussian")
cvmfit$lambda.min
## [1] 0.03184341
cvmfit$lambda.1se
## [1] 0.2705904