Part 1 LASSO

We use glmnet package in R to run LASSO regressions with different values of \(\lambda\).

fitcv = cv.glmnet(X,y, family="gaussian")
plot(fitcv)

The plot below shows that as \(\lambda\) gets larger, more coefficients shrink to zero, thus increasing the bias of the model as shown by an increasing mean-squared error. Each \(\lambda\) in the plot corresponds to a regression. For each regression We calculate a Prediction Error defined as:

\(\left \| X \widetilde{\beta} -X\beta^0\right \|_2^2/\left \| X\beta^0 \right \|_2^2\),

or in equivalent matrix form:

\((\widetilde{\beta}^TX^TX\widetilde{\beta}-2\widetilde{\beta}^TX^TX\beta^0 + \beta^{0T}X^TX\beta^0)/ \beta^{0T}X^TX\beta^0\)

where \(\widetilde{\beta}\) are the estimated coefficients and \(\beta^0\) are real coefficients.

The table and plot below show then number of variables kept and corresponding prediction error.

##       Number of Non-zero Coefficients    Lambda Prediction Error
##  [1,]                               0 1.2278578        1.0000000
##  [2,]                               2 1.1187783        0.9530738
##  [3,]                               3 1.0193891        0.8976484
##  [4,]                               6 0.9288294        0.8124570
##  [5,]                               8 0.8463148        0.6978552
##  [6,]                               8 0.7711305        0.5778719
##  [7,]                               8 0.7026254        0.4784579
##  [8,]                               8 0.6402061        0.3961034
##  [9,]                               8 0.5833319        0.3278958
## [10,]                               8 0.5315103        0.2714188

Part 2 MIQP using `gurobi`

Formulation

The formulation is equivalent to:

\(min\ \beta^TX^TX\beta - 2y^TX\beta\)

\(s.t.\)

\((1)\ \sum_iz_i \leq k\)

\((2)\ \beta_i \leq Mz_i,\ i=1,...,p\)

\((3)\ \beta_i \geq -Mz_i,\ i=1,...,p\)

\((4)\ z_i\ are\ binary\)

Solution

The function below will solve for optimal \(\beta s\) with varying values of \(M\). When \(M\) is small, constraints \((2)\) and \((3)\) above could be binding when \(z_i \neq 0\). As \(M\) continues to increase to a point when \((2)\) and \((3)\) are no longer binding constraints for non-zero \(\beta s\), the values of \(\beta s\) should no longer change with the increase of \(M\).

gurobi_m <- function(m){
  model <- list()

  model$Q = cbind(rbind(XtX,matrix(0,p,p)),matrix(0,2*p,p))
  model$obj = append(YtX, rep(0,p))
  model$vtype = append(rep('C',p), rep('B',p))
  
  model$sense = c('<=',rep('>=',p),rep('>=',p))
  
  M = m
  k = 8
  
  cons1 = append(rep(0,p), rep(1,p))
  cons2 = cbind(diag(1,p), diag(M,p))
  cons3 = cbind(diag(-1,p), diag(M,p))
  
  model$A = rbind(cons1,cons2,cons3)
  model$rhs = c(k, rep(0,p), rep(0,p))
  return(model)
}

We choose \(M\) values from \(0.5,1,2,4,8\)

m_values <- c(0.5,1,2,4,8)
gurobi_table <- cbind(m_values, rep(0,5))

beta_list = rep(NULL,8)
for(i in 1:length(m_values)){
  m = m_values[i]
  result <- gurobi(gurobi_m(m))
  beta_new = result$x[(1):(p)]
  beta_non_zero = beta_new[beta_new != 0]
  beta_list = cbind(beta_list,beta_non_zero)
  error = (t(beta_new) %*% XtX %*% beta_new + t(beta_real) %*% XtX %*% beta_real -2 * t(X%*%beta_new) %*% (X%*%beta_real)) / (t(beta_real) %*% XtX %*% beta_real )
  gurobi_table[i,2] <- error  
}

## Warning for adding variables: zero or small (< 1e-13) coefficients, ignored
## Optimize a model with 129 rows, 128 columns and 320 nonzeros
## Model has 2080 quadratic objective terms
## Coefficient statistics:
##   Matrix range    [5e-01, 1e+00]
##   Objective range [5e+00, 1e+03]
##   Bounds range    [1e+00, 1e+00]
##   RHS range       [8e+00, 8e+00]
## Found heuristic solution: objective 0
## Presolve removed 64 rows and 0 columns
## Presolve time: 0.00s
## Presolved: 65 rows, 128 columns, 192 nonzeros
## Presolved model has 2080 quadratic objective terms
## Variable types: 64 continuous, 64 integer (64 binary)
## 
## Root relaxation: objective -3.080563e+03, 149 iterations, 0.00 seconds
## 
##     Nodes    |    Current Node    |     Objective Bounds      |     Work
##  Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time
## 
##      0     0 -3080.5630    0    4    0.00000 -3080.5630      -     -    0s
## H    0     0                    -3080.233051 -3080.5630  0.01%     -    0s
##      0     0     cutoff    0      -3080.2331 -3080.2331  0.00%     -    0s
## 
## Explored 0 nodes (152 simplex iterations) in 0.00 seconds
## Thread count was 8 (of 8 available processors)
## 
## Optimal solution found (tolerance 1.00e-04)
## Best objective -3.080233051211e+03, best bound -3.080233051211e+03, gap 0.0%
## Warning for adding variables: zero or small (< 1e-13) coefficients, ignored
## Optimize a model with 129 rows, 128 columns and 320 nonzeros
## Model has 2080 quadratic objective terms
## Coefficient statistics:
##   Matrix range    [1e+00, 1e+00]
##   Objective range [5e+00, 1e+03]
##   Bounds range    [1e+00, 1e+00]
##   RHS range       [8e+00, 8e+00]
## Found heuristic solution: objective 0
## Presolve removed 64 rows and 0 columns
## Presolve time: 0.00s
## Presolved: 65 rows, 128 columns, 192 nonzeros
## Presolved model has 2080 quadratic objective terms
## Variable types: 64 continuous, 64 integer (64 binary)
## 
## Root relaxation: objective -4.178648e+03, 130 iterations, 0.00 seconds
## 
##     Nodes    |    Current Node    |     Objective Bounds      |     Work
##  Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time
## 
##      0     0 -4178.6483    0   22    0.00000 -4178.6483      -     -    0s
## H    0     0                    -4157.513179 -4178.6483  0.51%     -    0s
##      0     0 -4157.5192    0    2 -4157.5132 -4157.5192  0.00%     -    0s
## 
## Explored 0 nodes (178 simplex iterations) in 0.02 seconds
## Thread count was 8 (of 8 available processors)
## 
## Optimal solution found (tolerance 1.00e-04)
## Best objective -4.157513178863e+03, best bound -4.157519237282e+03, gap 0.0001%
## Warning for adding variables: zero or small (< 1e-13) coefficients, ignored
## Optimize a model with 129 rows, 128 columns and 320 nonzeros
## Model has 2080 quadratic objective terms
## Coefficient statistics:
##   Matrix range    [1e+00, 2e+00]
##   Objective range [5e+00, 1e+03]
##   Bounds range    [1e+00, 1e+00]
##   RHS range       [8e+00, 8e+00]
## Found heuristic solution: objective 0
## Presolve removed 64 rows and 0 columns
## Presolve time: 0.00s
## Presolved: 65 rows, 128 columns, 192 nonzeros
## Presolved model has 2080 quadratic objective terms
## Variable types: 64 continuous, 64 integer (64 binary)
## 
## Root relaxation: objective -4.195961e+03, 91 iterations, 0.00 seconds
## 
##     Nodes    |    Current Node    |     Objective Bounds      |     Work
##  Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time
## 
##      0     0 -4195.9608    0   29    0.00000 -4195.9608      -     -    0s
## H    0     0                    -2840.152545 -4195.9608  47.7%     -    0s
## H    0     0                    -4169.663033 -4195.9608  0.63%     -    0s
##      0     0     cutoff    0      -4169.6630 -4169.6630  0.00%     -    0s
## 
## Explored 0 nodes (91 simplex iterations) in 0.02 seconds
## Thread count was 8 (of 8 available processors)
## 
## Optimal solution found (tolerance 1.00e-04)
## Best objective -4.169663033237e+03, best bound -4.169663033237e+03, gap 0.0%
## Warning for adding variables: zero or small (< 1e-13) coefficients, ignored
## Optimize a model with 129 rows, 128 columns and 320 nonzeros
## Model has 2080 quadratic objective terms
## Coefficient statistics:
##   Matrix range    [1e+00, 4e+00]
##   Objective range [5e+00, 1e+03]
##   Bounds range    [1e+00, 1e+00]
##   RHS range       [8e+00, 8e+00]
## Found heuristic solution: objective 0
## Presolve removed 64 rows and 0 columns
## Presolve time: 0.02s
## Presolved: 65 rows, 128 columns, 192 nonzeros
## Presolved model has 2080 quadratic objective terms
## Variable types: 64 continuous, 64 integer (64 binary)
## 
## Root relaxation: objective -4.195961e+03, 90 iterations, 0.00 seconds
## 
##     Nodes    |    Current Node    |     Objective Bounds      |     Work
##  Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time
## 
##      0     0 -4195.9608    0   29    0.00000 -4195.9608      -     -    0s
## H    0     0                    -2204.698698 -4195.9608  90.3%     -    0s
## H    0     0                    -4169.663033 -4195.9608  0.63%     -    0s
##      0     0     cutoff    0      -4169.6630 -4169.6630  0.00%     -    0s
## 
## Explored 0 nodes (90 simplex iterations) in 0.03 seconds
## Thread count was 8 (of 8 available processors)
## 
## Optimal solution found (tolerance 1.00e-04)
## Best objective -4.169663033237e+03, best bound -4.169663033237e+03, gap 0.0%
## Warning for adding variables: zero or small (< 1e-13) coefficients, ignored
## Optimize a model with 129 rows, 128 columns and 320 nonzeros
## Model has 2080 quadratic objective terms
## Coefficient statistics:
##   Matrix range    [1e+00, 8e+00]
##   Objective range [5e+00, 1e+03]
##   Bounds range    [1e+00, 1e+00]
##   RHS range       [8e+00, 8e+00]
## Found heuristic solution: objective 0
## Presolve removed 64 rows and 0 columns
## Presolve time: 0.02s
## Presolved: 65 rows, 128 columns, 192 nonzeros
## Presolved model has 2080 quadratic objective terms
## Variable types: 64 continuous, 64 integer (64 binary)
## 
## Root relaxation: objective -4.195961e+03, 90 iterations, 0.00 seconds
## 
##     Nodes    |    Current Node    |     Objective Bounds      |     Work
##  Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time
## 
##      0     0 -4195.9608    0   29    0.00000 -4195.9608      -     -    0s
## H    0     0                    -2204.698698 -4195.9608  90.3%     -    0s
## H    0     0                    -4169.663033 -4195.9608  0.63%     -    0s
##      0     0     cutoff    0      -4169.6630 -4169.6630  0.00%     -    0s
## 
## Explored 0 nodes (90 simplex iterations) in 0.04 seconds
## Thread count was 8 (of 8 available processors)
## 
## Optimal solution found (tolerance 1.00e-04)
## Best objective -4.169663033237e+03, best bound -4.169663033237e+03, gap 0.0%

print (beta_new)

##  [1] 0.8930573 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
##  [8] 0.0000000 1.0892370 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [15] 0.0000000 0.0000000 0.9921031 0.0000000 0.0000000 0.0000000 0.0000000
## [22] 0.0000000 0.0000000 0.0000000 1.1158345 0.0000000 0.0000000 0.0000000
## [29] 0.0000000 0.0000000 0.0000000 0.0000000 0.9797358 0.0000000 0.0000000
## [36] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 1.0029896 0.0000000
## [43] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 1.0201952
## [50] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [57] 1.0389175 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [64] 0.0000000

Part 3 Comparing LASSO and MIQP Results

Prediction Error

The outputs below show the prediction error for the two approaches. For MIQP, prediction error converges to \(0.004456\) as \(M\) gets larger. For LASSO, the smallest \(\lambda\) giving us 8 variables yields a prediction error of \(0.024466\), about 5 times the prediction error than that of MIQP. LASSO regression is regularized by a penalty parameter \(\lambda\) which brings in statistical bias to the model, so the prediction is much off comparing to that from MIQP method.

##      m_values Prediction Error
## [1,]      0.5      0.250000000
## [2,]      1.0      0.001431754
## [3,]      2.0      0.004456055
## [4,]      4.0      0.004456055
## [5,]      8.0      0.004456055

##       Number of Non-zero Coefficients    Lambda Prediction Error
##  [1,]                               8 0.8463148       0.69785520
##  [2,]                               8 0.7711305       0.57787191
##  [3,]                               8 0.7026254       0.47845794
##  [4,]                               8 0.6402061       0.39610338
##  [5,]                               8 0.5833319       0.32789580
##  [6,]                               8 0.5315103       0.27141881
##  [7,]                               8 0.4842924       0.22466720
##  [8,]                               8 0.4412692       0.18597773
##  [9,]                               8 0.4020681       0.15397053
## [10,]                               8 0.3663495       0.12750096
## [11,]                               8 0.3338040       0.10561967
## [12,]                               8 0.3041498       0.08753928
## [13,]                               8 0.2771299       0.07260683
## [14,]                               8 0.2525105       0.06028085
## [15,]                               8 0.2300781       0.05011268
## [16,]                               8 0.2096386       0.04172997
## [17,]                               8 0.1910149       0.03482441
## [18,]                               8 0.1740457       0.02914033
## [19,]                               8 0.1585839       0.02446620

Coefficient Estimates

We can see that the best LASSO regression (in terms of smallest prediction error) with 8 non-zero coefficients picks out the same set of variables as MIQP method does. The estimates of the two methods do not converge. Since LASSO introduces bias by the penalty parameter \(\lambda\), the estimates generated by MIQP method should be more accurate.

##      MIQP Beta Estimates          
## [1,] "Beta 1"  "0.893057321191464"
## [2,] "Beta 9"  "1.08923704401365" 
## [3,] "Beta 17" "0.992103052242995"
## [4,] "Beta 25" "1.11583445462063" 
## [5,] "Beta 33" "0.979735809388095"
## [6,] "Beta 41" "1.00298956064304" 
## [7,] "Beta 49" "1.02019522338848" 
## [8,] "Beta 57" "1.03891751790867"

##      LASSO Beta Estimates          
## [1,] "Beta 1"   "0.749627302917313"
## [2,] "Beta 9"   "0.910242153018034"
## [3,] "Beta 17"  "0.812497028012497"
## [4,] "Beta 25"  "0.97058930405387" 
## [5,] "Beta 33"  "0.801172093676408"
## [6,] "Beta 41"  "0.832688277865887"
## [7,] "Beta 49"  "0.886554127861054"
## [8,] "Beta 57"  "0.88196233303159"

MIS 381N SCO Project 3

Group 3: vijai Kasthuri Rangan, Shreyas K.S, Yuwen WANG, Daniel Zalewski

March 21, 2016

Part 1 LASSO

Part 2 MIQP using `gurobi`

Formulation

Solution

Part 3 Comparing LASSO and MIQP Results

Prediction Error

Coefficient Estimates

MIS 381N SCO Project 3

Group 3: vijai Kasthuri Rangan, Shreyas K.S, Yuwen WANG, Daniel Zalewski

March 21, 2016

Part 1 LASSO

Part 2 MIQP using gurobi

Formulation

Solution

Part 3 Comparing LASSO and MIQP Results

Prediction Error

Coefficient Estimates

Part 2 MIQP using `gurobi`