Abstract

In this short note we present and briefly discuss the R package islasso dealing with regression models having a large number of covariates. Estimation is carried out by penalizing the coefficients via a quasi-lasso penalty, wherein the nonsmooth lasso penalty is replaced by its smooth counterpart determined iteratively by data according to the induced smoothing idea. The package includes functions to estimate the model and to test for linear hypothesis on linear combinations of relevant coefficients. We illustrate R code throughout a worked example, by avoiding intentionally to report details and extended bibliography.

Introduction

Let \(\mathbf{y} = \mathbf{X}\beta + \mathbf{\epsilon}\) be the linear model of interest with usual zero-means and homoscedastic errors. As usual, \(\mathbf{y} = (y_1,\ldots,y_n)^T\) is the response vector, \(\mathbf{X}\) is the \(n \times p\) design matrix (having \(p\) quite large) with regression coefficients \(\mathbf{\beta}\). When interest lies in selecting the non-noise covariates and estimating the relevant effect, one assumes the lasso penalized objective function (Tibshirani, 1996), \[\frac{1}{2}||\mathbf{y}-\mathbf{X}\mathbf{\beta}||_2^2+\lambda||\mathbf{\beta}||_1\]

The R functions

The main function of the package is islasso() where the user supplies the model formula as in the usual lm or glm functions, i.e.

family accepts specification of family and link function as in Table 1, lambda is the tuning parameter and unpenalized allows to indicate covariates with unpenalized coefficients.

Table 1. Families and link functions allowed in islasso

family link
gaussian identity
binomial logit, probit
poisson log
gamma identity, log, inverse

The fitter function is which reads as

which actually implements the estimating algorithm as described in the paper. The lambda argument in islasso.fit and islasso specifies the positive tuning parameter in the penalized objective. Any non-negative value can be provided, but if missing, it is computed via \(K\)-fold cross validation by the function cv.glmnet() from package glmnet. The number of folds being used can be specified via the argument nfolds of the auxiliary function is.control().

A worked example: the Diabetes data set

We use the well-known diabetes dataset available in the lars package. The data refer to \(n = 442\) patients enrolled to investigate a measure of disease progression one year after the baseline. There are ten covariates, (age, sex, bmi (body mass index), map (average blood pressure) and several blood serum measurements (tc, ldl, hdl, tch, ltg, glu). The matrix x2 in the dataframe also includes second-order terms, namely first-order interactions between covariates, and quadratic terms for the continuous variables.

To select the important terms in the regression equation we apply the lasso

> [1] 1344.186
> [1] 15

Ten-fold cross validation “selects” \(\lambda=\) 1344.186. corresponding to 15 non null coefficients

>  [1] "(Intercept)" "sex"         "bmi"         "map"         "hdl"        
>  [6] "ltg"         "glu"         "age^2"       "bmi^2"       "glu^2"      
> [11] "age:sex"     "age:map"     "age:ltg"     "age:glu"     "bmi:map"

The last six estimates are

>      glu^2    age:sex    age:map    age:ltg    age:glu    bmi:map 
>  69.599081 107.479925  29.970061   8.506032  11.675332  85.530937

A reasonable question is if all the “selected” coefficients are significant in the model. Unfortunately lasso regression does not return standard errors due to nonsmoothness of objective, and some alternative approaches have been proposed, including the `covariance test’ (Lockhart et al., 2013)

>  Predictor_Number Drop_in_covariance P-value
>                 3            20.1981  0.0000
>                 9            52.5964  0.0000
>                 4             5.7714  0.0034
>                 7             4.0840  0.0176
>                37             1.3310  0.2655
>                20             0.3244  0.7232

The results suggest that only the terms corresponding to columns 3, 9, 4, and 7 in the matrix x2 are significant, namely

> [1] "bmi" "ltg" "map" "hdl"

Covtest returns \(p\)-values across the \(\lambda\) path. It means that such \(p\)-values are not matched to the corresponding point estimates obtained at the optimal \(\lambda\) value (\(\lambda=\) 1344.186 in this example). As a consequence, just some of the selected non-null coefficients could be assessed as significant. Among the (few) strategies, including the post-selective inference and the (modified) residual bootstrap, here we illustrate the R package islasso implementing the recent `quasi’ lasso approach based on the induced smoothing idea (Brown and Wang, 2005) as discussed in Cilluffo et al. (2019)

While the optimal lambda could be selected (without supplying any value to lambda), we use the same above value to facilitate comparisons

The summary method quickly returns the main output of the fitted model, including point estimates, standard errors and \(p\)-values. Visualizing estimates for all covariates could be somewhat inconvenient, especially when the number of covariates is large, thus we decide to print estimates only if the pvalue is less than a threshold value. We use 0.10

> 
> Call:
> islasso(formula = y ~ x2, lambda = 1344.186, data = diabetes)
> 
> Residuals:
>     Min      1Q  Median      3Q     Max 
> -138.65  -40.18   -4.46   34.45  143.35 
> 
>             Estimate Std. Error     Df z value Pr(>|z|)    
> (Intercept)  152.133      2.542  1.000  59.842  < 2e-16 ***
> x2sex       -114.624     53.818  0.890  -2.130  0.03318 *  
> x2bmi        494.758     69.273  1.000   7.142 9.19e-13 ***
> x2map        250.768     63.537  1.000   3.947 7.92e-05 ***
> x2hdl       -188.773     63.736  0.985  -2.962  0.00306 ** 
> x2ltg        464.780     67.117  1.000   6.925 4.36e-12 ***
> x2glu^2       74.504     45.089  0.758   1.652  0.09846 .  
> x2age:sex    108.371     50.174  0.904   2.160  0.03078 *  
> x2bmi:map     85.180     46.552  0.811   1.830  0.06728 .  
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
> (Dispersion parameter for gaussian family taken to be 2856.735)
> 
>     Null deviance: 2621009  on 441.0  degrees of freedom
> Residual deviance: 1223354  on 428.2  degrees of freedom
> AIC: 4787.1
> Lambda: 1344.2
> 
> Number of Newton-Raphson iterations: 136

In addition to the usual information printed by the summary method, the output also includes the column Df representing the degrees of freedom of each coefficient. Their sum is used to quantify the model complexity

> [1] 13.76514

and the corresponding residual degrees of freedom (428.2348576) as reported above. The Wald test (column z value) and \(p\)-values can be used to assess important or significant covariates. In addition to those ruled out by covTest, “bmi” “ltg” “map” “hdl”, islasso() also returns ‘small’ p-values for the terms “sex”, “sex:age”, and “bmi:map”.

As an alternative to cross validation, it is also possible to select the tuning parameter \(\lambda\) by means the Bayesian or Akaike Information Criterion. The function aic.islasso, requires a islasso fit object and specification of the criterion to be used (AIC/BIC). Hence

Comparisons between methods to select the tuning parameter and further discussions We conclude this short note by emphasizing that islasso also accepts the so-called elastic-net penalty, such that \[ \frac{1}{2}||\mathbf{y}- \mathbf{X\beta}||_2^{2}+\lambda \{ \alpha ||\mathbf{\beta} ||^{}_1 + \frac{1}{2}(1-\alpha) ||\mathbf{\beta} ||^{2}_2 \} \] where \(0\le \alpha\le 1\) is the mixing parameter to be specified in islasso() and islasso.fit() via the argument alpha.

References