We first run a lasso to implement variable selection. Lasso regression tries minimizing \[\frac{1}{2} \sum_{i=1}^n (y-X \beta_i)^2 + \lambda \sum_{i=1}^n |\beta_i|\]
## Lambda with lowest prediction error is 24 with an error of 0.006672433
We observe that no \(\lambda\) value between 0 and 100 could reduce any variable coefficient to zero. Lasso regression on the data-set provided is shrinking beta coefficients but not reducing them to zero to eliminate any.
We then run Gurobi optimizer on the same dataset to directly select variables using MIQP. The optimizer minimizes \[\frac{1}{2} \sum_{i=1}^n (y-X \beta_i)^2\], subject to atmost 8 betas are non-zeros.
## Loading required package: slam
## Warning for adding variables: zero or small (< 1e-13) coefficients, ignored
## Optimize a model with 129 rows, 128 columns and 320 nonzeros
## Model has 2080 quadratic objective terms
## Coefficient statistics:
## Matrix range [1e+00, 1e+00]
## Objective range [3e+00, 6e+02]
## Bounds range [1e+00, 1e+00]
## RHS range [8e+00, 8e+00]
## Found heuristic solution: objective 0
## Presolve removed 64 rows and 0 columns
## Presolve time: 0.00s
## Presolved: 65 rows, 128 columns, 192 nonzeros
## Presolved model has 2080 quadratic objective terms
## Variable types: 64 continuous, 64 integer (64 binary)
##
## Root relaxation: objective -2.089324e+03, 117 iterations, 0.00 seconds
##
## Nodes | Current Node | Objective Bounds | Work
## Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
##
## 0 0 -2089.3242 0 22 0.00000 -2089.3242 - - 0s
## H 0 0 -2078.756589 -2089.3242 0.51% - 0s
## 0 0 -2078.7596 0 2 -2078.7566 -2078.7596 0.00% - 0s
##
## Explored 0 nodes (163 simplex iterations) in 0.02 seconds
## Thread count was 4 (of 4 available processors)
##
## Optimal solution found (tolerance 1.00e-04)
## Best objective -2.078756589432e+03, best bound -2.078759618641e+03, gap 0.0001%
## Warning for adding variables: zero or small (< 1e-13) coefficients, ignored
## Optimize a model with 129 rows, 128 columns and 320 nonzeros
## Model has 2080 quadratic objective terms
## Coefficient statistics:
## Matrix range [1e+00, 2e+00]
## Objective range [3e+00, 6e+02]
## Bounds range [1e+00, 1e+00]
## RHS range [8e+00, 8e+00]
## Found heuristic solution: objective 0
## Presolve removed 64 rows and 0 columns
## Presolve time: 0.00s
## Presolved: 65 rows, 128 columns, 192 nonzeros
## Presolved model has 2080 quadratic objective terms
## Variable types: 64 continuous, 64 integer (64 binary)
##
## Root relaxation: objective -2.097980e+03, 91 iterations, 0.00 seconds
##
## Nodes | Current Node | Objective Bounds | Work
## Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
##
## 0 0 -2097.9804 0 29 0.00000 -2097.9804 - - 0s
## H 0 0 -1420.076272 -2097.9804 47.7% - 0s
## H 0 0 -2084.831517 -2097.9804 0.63% - 0s
## 0 0 cutoff 0 -2084.8315 -2084.8315 0.00% - 0s
##
## Explored 0 nodes (91 simplex iterations) in 0.03 seconds
## Thread count was 4 (of 4 available processors)
##
## Optimal solution found (tolerance 1.00e-04)
## Best objective -2.084831516618e+03, best bound -2.084831516618e+03, gap 0.0%
## The Value of M is : 2
## Number of Non Zero Coefficients for MIQP: 8 with Prediction error of 0.004456055
We have 8 non-zero coefficients from MIQP as compared to 64 via Lasso. Also, error rate for MIQP is 0.0045, which is ~32% lesser than the prediction error of 0.0066 from Lasso regression.