Stochastic Optimization HW4

Problem 1:

It costs a company $12 to purchase an hour of labor and $15 to purchase an hour of capital. If L hours of labor and K units of capital are available, then 0.05L!/!K!/! machines can be produced. Suppose the company has $100,000 to purchase labor and capital. What is the maximum number of machines it can produce?

#choose: # of labor hours ($12), # of capital hours ($15)
#objective: MAXIMIZE - 0.05L^(2/3)K^(1/3)
#subject to:  12L + 15K = 100,000

library(lpSolve)
library(quadprog)

solve = function(find_L){
      find_K = (100000 - 12*find_L)/15
      machine = 0.05*(find_L^(2/3))*(find_K^(1/3))
      return(-machine)
}

Solution = optim(10, solve, method = 'L-BFGS-B')
L = Solution$par
K = (100000 - 12*L)/15

Solution
## $par
## [1] 5555.556
## 
## $value
## [1] -204.6684
## 
## $counts
## function gradient 
##       13       13 
## 
## $convergence
## [1] 0
## 
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
L
## [1] 5555.556
K
## [1] 2222.222
#maximum number of machines the company can produce (204)
0.05*(L^(2/3))*(K^(1/3))
## [1] 204.6684



Problem 2:

The file homework4stocks.csv contains historical monthly returns for 27 companies. The first row contains stock names and the first column contains the dates. For each company, calculate the estimated mean return and the estimated variance of return. Then calculate the estimated correlations between the companies’ returns.

Find a portfolio that achieves an expected monthly return of at least 1% and minimizes portfolio variance. What are the fractions invested in each stock? What are the portfolio’s estimated mean, variance, and standard deviation?

dir()
## [1] "figure"                 "homework4stocks.csv"   
## [3] "nflratings.csv"         "quadProgramming.html"  
## [5] "quadProgramming.md"     "quadProgramming.Rmd"   
## [7] "variable_selection.csv"
stock_data = read.csv("homework4stocks.csv", header=TRUE, sep=",")
#head(stock_data)

#For each company, calculate the estimated mean return and the estimated variance of return
stock_mean = sapply(stock_data[,2:28], mean)
stock_var = sapply(stock_data[,2:28], var)

#Then calculate the estimated correlations between the companies’ returns
stock_cor = cor(stock_data[,2:28], use="complete.obs")
stock_cov = cov(stock_data[,2:28], use="complete.obs")
#head(stock_cov)

#Find a portfolio that achieves an expected monthly return of at least 1% and minimizes portfolio variance
Dmat = stock_cov*2
dvec = rep(0, 27)
###################
init_Amat = matrix(0,27,27)
diag(init_Amat) = 1
investment_frac = rep(1,27)
Amat = cbind(init_Amat, investment_frac, stock_mean)
head(Amat)
##                                                            investment_frac
## AA   1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0               1
## AAPL 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0               1
## AXP  0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0               1
## BA   0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0               1
## CAT  0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0               1
## CSCO 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0               1
##        stock_mean
## AA   -0.005779221
## AAPL  0.049285714
## AXP   0.007493506
## BA    0.010662338
## CAT   0.012389610
## CSCO  0.002532468
###################
#at least 1% monthly return:
bvec = c(1, 0.01, rep(0,27))
meq=1

#Solve: fractions invested in each stock
solve2 = solve.QP(Dmat, dvec, Amat, bvec, meq)
solve2
## $solution
##  [1]  1.000000e+00  4.433786e-02  5.887195e-17  2.274677e-16  2.774893e-16
##  [6]  1.637121e-16 -8.267179e-16 -4.783879e-16 -1.755748e-17 -9.968924e-17
## [11] -8.132007e-16 -1.068155e-16  4.412334e-16 -4.840613e-16  2.290254e-16
## [16] -3.469178e-16  0.000000e+00  5.568166e-17  1.407705e-01  4.634830e-16
## [21]  1.075847e-01  1.232704e-17 -5.586254e-16 -4.747290e-17 -8.393359e-16
## [26]  5.208297e-16 -3.684860e-16
## 
## $value
## [1] 0.01518211
## 
## $unconstrained.solution
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 
## $iterations
## [1] 26  0
## 
## $Lagrangian
##  [1] 0.030364220 0.000000000 0.013268356 0.006213757 0.014063218
##  [6] 0.008709413 0.010840664 0.008487033 0.012650625 0.008669229
## [11] 0.012005035 0.016685060 0.006096763 0.004883582 0.020781633
## [16] 0.002668387 0.004435908 0.003571231 0.000000000 0.006870251
## [21] 0.000000000 0.004735303 0.006818154 0.002022914 0.004472986
## [26] 0.005752295 0.001552431 0.000000000 0.296000137
## 
## $iact
##  [1]  1 15  8  5 26 13  7 10 20 14 27 25 18 29 11 23 16  6 22 12  3 24  4
## [24]  9 17
#What are the portfolio’s estimated mean, variance, and standard deviation?
portfolio_mean= matrix(stock_mean,1,27) %*% matrix(solve2$solution,27,1)
portfolio_variance = solve2$value
portfolio_standardDev = sqrt(portfolio_variance)

portfolio_mean
##               [,1]
## [1,] -1.003889e-17
portfolio_variance
## [1] 0.01518211
portfolio_standardDev
## [1] 0.1232157



Problem 3:

The file ‘variable_selection.csv’ contains observations of variables y, x1, x2, and x3. Here, y is the dependent variable. We want to choose a linear model that uses at most 2 independent variables such that the sum of squared residuals is minimized. This can be formulated as a constrained quadratic programming problem.

This is called best subset problem which is usually very hard to solve. We will solve this problem by enumeration. Run six OLS regressions (3 with one independent variable and 3 more with two variables each) and choose the regression that best fits the data. When we learn how to solve QPs with integer constraints you can revisit this problem and solve it as a QP.

#Minimizing sum of squared residuals
dir()
## [1] "figure"                 "homework4stocks.csv"   
## [3] "nflratings.csv"         "quadProgramming.html"  
## [5] "quadProgramming.md"     "quadProgramming.Rmd"   
## [7] "variable_selection.csv"
subset_data = read.csv("variable_selection.csv", header=TRUE, sep=",")
#head(subset_data)
attach(subset_data)

model1 = lm(y~x1) 
#summary(model1)

model2 = lm(y~x2) 
#summary(model2)

model3 = lm(y~x3) 
#summary(model3)

#This model, regressing Y on X1 an X2 yields the best results 
#as evidenced by the Adjusted R-Squared value of 0.9969
model4 = lm(y~x1 + x2) 
summary(model4)
## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4861 -0.2674  0.0260  0.3658  1.1301 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.15258    0.21048   0.725     0.47    
## x1           2.99924    0.05337  56.195   <2e-16 ***
## x2           3.96917    0.02324 170.781   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5196 on 97 degrees of freedom
## Multiple R-squared:  0.997,  Adjusted R-squared:  0.9969 
## F-statistic: 1.592e+04 on 2 and 97 DF,  p-value: < 2.2e-16
model5 = lm(y~x2 + x3) 
#summary(model5)

model6 = lm(y~x1 + x3)
#summary(model6)



Problem 4:

In an electrical network, the power loss incurred when a current of I amperes flows through a resistance of R ohms is I2R watts. In the figure below, 710 amperes of current must be sent from node 1 to node 4. The current flowing through each node must satisfy conservation of flow. For example, for node 1, 710 = flow through 1-ohm resistor + flow through 4-ohm resistor. Remarkably, nature determines the current flow through each resistor by minimizing the total power loss in the network.

  1. Formulate a quadratic programming problem whose solution will yield the current flowing through each resistor.

  2. Problem Formulation:

    Choose: X12, X13, X23, X24, X34

    Objective: MINIMIZE (X12^2*1 + X13^2*4 + X23^2*6 + X24^2*12 + X34^2*3)

    Subject to:

    x12 + x13 = 710
    x24 + x34 = 710
    x12 - x23 - x24 = 0
    x13 + x23 - x34 = 0

  3. Use R to determine the current flowing through each resistor.

#1.
Dmat1 = 2 * matrix(c(1,0,0,0,0,
                    0,4,0,0,0,
                    0,0,6,0,0,
                    0,0,0,12,0,
                    0,0,0,0,3),5,5)

dvec1 = rep(0,5)
Amat1 = matrix(rep(0,20),4,5)
Amat1[,1]=c(1, 1, 0, 0)
Amat1[,2]=c(1, 0, 1, 0)
Amat1[,3]=c(0, -1, 1, 0)
Amat1[,4]=c(0, -1, 0, 1)
Amat1[,5]=c(0, 0, -1, 1)
Amat1 = t(Amat1)
Amat1
##      [,1] [,2] [,3] [,4]
## [1,]    1    1    0    0
## [2,]    1    0    1    0
## [3,]    0   -1    1    0
## [4,]    0   -1    0    1
## [5,]    0    0   -1    1
bvec1 = c(710,0,0,710)


solve4 = solve.QP(Dmat1, dvec1, Amat1, bvec1)

#current flowing through each resistor
solve4$solution
## [1] 371.3846 338.6154 163.8462 207.5385 502.4615
#total power loss in network (minimized)
solve4$value
## [1] 2031911



Problem 5:

The goal is to determine a set of ratings for the 32 NFL teams that most accurately predicts the actual outcomes of the games played. Use NLP to find the ratings that best predict the actual point spreads observed. The model will estimate the home team advantage and the ratings. The objective is to minimize the sum of squared prediction errors.

You will need to calculate the following:

You will also need to normalize the ratings. To do this, you set the actual average of the ratings to be 85 (this is somewhat arbitrary but based on the well-known Sagarin rating system).

dir()
## [1] "figure"                 "homework4stocks.csv"   
## [3] "nflratings.csv"         "quadProgramming.html"  
## [5] "quadProgramming.md"     "quadProgramming.Rmd"   
## [7] "variable_selection.csv"
nfl = read.csv("nflratings.csv", header = FALSE, sep = ",")
#head(nfl)

#Creating point-spread variable
nfl$pointspread = nfl$V4-nfl$V5

est_ratings = function(ratings){
  pred_error = sum((nfl$pointspread - (ratings[nfl$V2] - ratings[nfl$V3] + ratings[33]))^2)
  return(pred_error)
}

nfl_ratings = c(rep(85,32),3)

solve5 = optim(nfl_ratings, est_ratings, method="BFGS")

#optimal team ratings
solve5$par
##  [1] 84.522566 89.841419 92.745133 83.089040 88.759911 79.812783 87.543466
##  [8] 76.886335 92.120532 85.636612 70.503419 92.255043 86.984786 90.862269
## [15] 78.439410 76.888451 86.615151 92.064610 96.122969 95.629243 85.099322
## [22] 93.148705 75.033730 90.958290 86.641880 67.720126 92.605809 85.242154
## [29] 74.731612 79.171115 82.187728 80.136383  2.172814