Lasso Regression to select features:

We first run a lasso to implement variable selection. Lasso regression tries minimizing \[\frac{1}{2} \sum_{i=1}^n (y-X \beta_i)^2 + \lambda \sum_{i=1}^n |\beta_i|\]

Prediction errors for different values of \(\lambda\):

## Lambda with lowest prediction error is 24 with an error of 0.006672433

Non-zero coefficients at different values of \(\lambda\):

We observe that no \(\lambda\) value between 0 and 100 could reduce any variable coefficient to zero. Lasso regression on the data-set provided is shrinking beta coefficients but not reducing them to zero to eliminate any.

Direct Selection - MIQP

We then run Gurobi optimizer on the same dataset to directly select variables using MIQP. The optimizer minimizes \[\frac{1}{2} \sum_{i=1}^n (y-X \beta_i)^2\], subject to atmost 8 betas are non-zeros.

## Loading required package: slam
## Warning for adding variables: zero or small (< 1e-13) coefficients, ignored
## Optimize a model with 129 rows, 128 columns and 320 nonzeros
## Model has 2080 quadratic objective terms
## Coefficient statistics:
##   Matrix range    [1e+00, 1e+00]
##   Objective range [3e+00, 6e+02]
##   Bounds range    [1e+00, 1e+00]
##   RHS range       [8e+00, 8e+00]
## Found heuristic solution: objective 0
## Presolve removed 64 rows and 0 columns
## Presolve time: 0.00s
## Presolved: 65 rows, 128 columns, 192 nonzeros
## Presolved model has 2080 quadratic objective terms
## Variable types: 64 continuous, 64 integer (64 binary)
## 
## Root relaxation: objective -2.089324e+03, 117 iterations, 0.00 seconds
## 
##     Nodes    |    Current Node    |     Objective Bounds      |     Work
##  Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time
## 
##      0     0 -2089.3242    0   22    0.00000 -2089.3242      -     -    0s
## H    0     0                    -2078.756589 -2089.3242  0.51%     -    0s
##      0     0 -2078.7596    0    2 -2078.7566 -2078.7596  0.00%     -    0s
## 
## Explored 0 nodes (163 simplex iterations) in 0.02 seconds
## Thread count was 4 (of 4 available processors)
## 
## Optimal solution found (tolerance 1.00e-04)
## Best objective -2.078756589432e+03, best bound -2.078759618641e+03, gap 0.0001%
## Warning for adding variables: zero or small (< 1e-13) coefficients, ignored
## Optimize a model with 129 rows, 128 columns and 320 nonzeros
## Model has 2080 quadratic objective terms
## Coefficient statistics:
##   Matrix range    [1e+00, 2e+00]
##   Objective range [3e+00, 6e+02]
##   Bounds range    [1e+00, 1e+00]
##   RHS range       [8e+00, 8e+00]
## Found heuristic solution: objective 0
## Presolve removed 64 rows and 0 columns
## Presolve time: 0.00s
## Presolved: 65 rows, 128 columns, 192 nonzeros
## Presolved model has 2080 quadratic objective terms
## Variable types: 64 continuous, 64 integer (64 binary)
## 
## Root relaxation: objective -2.097980e+03, 91 iterations, 0.00 seconds
## 
##     Nodes    |    Current Node    |     Objective Bounds      |     Work
##  Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time
## 
##      0     0 -2097.9804    0   29    0.00000 -2097.9804      -     -    0s
## H    0     0                    -1420.076272 -2097.9804  47.7%     -    0s
## H    0     0                    -2084.831517 -2097.9804  0.63%     -    0s
##      0     0     cutoff    0      -2084.8315 -2084.8315  0.00%     -    0s
## 
## Explored 0 nodes (91 simplex iterations) in 0.03 seconds
## Thread count was 4 (of 4 available processors)
## 
## Optimal solution found (tolerance 1.00e-04)
## Best objective -2.084831516618e+03, best bound -2.084831516618e+03, gap 0.0%
## The Value of M is :  2
## Number of Non Zero Coefficients for MIQP:  8 with Prediction error of 0.004456055

We have 8 non-zero coefficients from MIQP as compared to 64 via Lasso. Also, error rate for MIQP is 0.0045, which is ~32% lesser than the prediction error of 0.0066 from Lasso regression.