In this lab, we will focus on linear and non-linear programming.
Linear programming, as discussed in the previous lab, works with simple and multiple linear regression techniques; sometimes the variables have completely direct or completely non-direct relationships and these techniques can model them.
Sometimes, however, the variables do not predict each other in a linear way. For example, looking at the stock market vs. time, we know that generally the market was booming before the crash, then the market crashed and the great depression hit, and slowly the market started to rise again.
This pattern is not linear, and in fact a non-linear programming technique can be used to model it and predict the value of the market based on the year.
In this lab, we will explore topics like optimization, solve a marketing model, and perform linear and non-linear regression on the cost of servers.
We are going to use tidyverse a collection of R packages designed for data science.
# Here we are checking if the package is installed
if(!require("tidyverse")){
# If the package is not in the system then it will be install
install.packages("tidyverse", dependencies = TRUE)
# Here we are loading the package
library("tidyverse")
}
Loading required package: tidyverse
[30m-- [1mAttaching packages[22m --------------------------------------- tidyverse 1.2.1 --[39m
[30m[32mv[30m [34mggplot2[30m 2.2.1 [32mv[30m [34mpurrr [30m 0.2.4
[32mv[30m [34mtibble [30m 1.4.2 [32mv[30m [34mdplyr [30m 0.7.4
[32mv[30m [34mtidyr [30m 0.7.2 [32mv[30m [34mstringr[30m 1.2.0
[32mv[30m [34mreadr [30m 1.1.1 [32mv[30m [34mforcats[30m 0.2.0[39m
[30m-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[30m [34mdplyr[30m::[32mfilter()[30m masks [34mstats[30m::filter()
[31mx[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()[39m
# Here we are checking if the package is installed
if(!require("plotly")){
# If the package is not in the system then it will be install
install.packages("plotly", dependencies = TRUE)
# Here we are loading the package
library("plotly")
}
Loading required package: plotly
Attaching package: <U+393C><U+3E31>plotly<U+393C><U+3E32>
The following object is masked from <U+393C><U+3E31>package:ggplot2<U+393C><U+3E32>:
last_plot
The following object is masked from <U+393C><U+3E31>package:stats<U+393C><U+3E32>:
filter
The following object is masked from <U+393C><U+3E31>package:graphics<U+393C><U+3E32>:
layout
lprec <- make.lp(0, 2)
lp.control(lprec, sense="max")
$anti.degen
[1] "fixedvars" "stalling"
$basis.crash
[1] "none"
$bb.depthlimit
[1] -50
$bb.floorfirst
[1] "automatic"
$bb.rule
[1] "pseudononint" "greedy" "dynamic" "rcostfixing"
$break.at.first
[1] FALSE
$break.at.value
[1] 1e+30
$epsilon
epsb epsd epsel epsint epsperturb epspivot
1e-10 1e-09 1e-12 1e-07 1e-05 2e-07
$improve
[1] "dualfeas" "thetagap"
$infinite
[1] 1e+30
$maxpivot
[1] 250
$mip.gap
absolute relative
1e-11 1e-11
$negrange
[1] -1e+06
$obj.in.basis
[1] TRUE
$pivoting
[1] "devex" "adaptive"
$presolve
[1] "none"
$scalelimit
[1] 5
$scaling
[1] "geometric" "equilibrate" "integers"
$sense
[1] "maximize"
$simplextype
[1] "dual" "primal"
$timeout
[1] 0
$verbose
[1] "neutral"
set.objfn(lprec, c(275.691, 48.341))
add.constraint(lprec, c(1, 1), "<=", 350000)
add.constraint(lprec, c(1, 0), ">=", 15000)
add.constraint(lprec, c(0, 1), ">=", 75000)
add.constraint(lprec, c(2, -1), "=", 0)
lprec
Model name:
C1 C2
Maximize 275.691 48.341
R1 1 1 <= 350000
R2 1 0 >= 15000
R3 0 1 >= 75000
R4 2 -1 = 0
Kind Std Std
Type Real Real
Upper Inf Inf
Lower 0 0
# solve
solve(lprec)
[1] 0
get.objective(lprec)
[1] 43443517
get.variables(lprec)
[1] 116666.7 233333.3
Name your dataset ‘mydata’ so it easy to work with.
Commands: read_csv() head()
mydata <- read.csv("data/ServersCost.csv")
head(mydata)
servers <- mydata$servers
cost <- mydata$cost
Corr <- cor(mydata)
Corr
servers cost
servers 1.00000000 0.03356606
cost 0.03356606 1.00000000
The correlation between these two variables is direct, positive, but very weak.
Commands: p <- qplot( x = INDEPENDENT, y = DEPENDENT, data = mydata) + geom_point()
p <- qplot( x = servers, y = cost, data = mydata) +geom_point()
p
Commmand: p + geom_smooth(method = “lm”)
p + geom_smooth(method = "lm")
linear_model <- lm( cost ~ servers, data =mydata )
predict (linear_model, data = mydata)
1 2 3 4 5 6 7 8
14795.19 14843.19 14891.19 14939.19 14987.19 15035.19 15083.19 15131.20
9 10 11 12 13 14 15 16
15179.20 15227.20 15275.20 15323.20 15371.20 15419.21 15467.21 15515.21
17 18 19 20
15563.21 15611.21 15659.21 15707.21
summary( linear_model )
Call:
lm(formula = cost ~ servers, data = mydata)
Residuals:
Min 1Q Median 3Q Max
-10646.2 -8646.2 -544.7 7066.0 12858.8
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14747.2 4035.5 3.654 0.00181 **
servers 48.0 336.9 0.142 0.88828
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 8687 on 18 degrees of freedom
Multiple R-squared: 0.001127, Adjusted R-squared: -0.05437
F-statistic: 0.0203 on 1 and 18 DF, p-value: 0.8883
We use a transformation and use a nonlinear quadratic model to see how the model fits to the data.
Quadratic Model: y = x + x^2
setwd("C:\\Users\\hp\\Documents\\Spring 2018\\BSAD 343H\\Labs\\Lab 7\\07-notebook-lab")
# y = x + x^2
servers = mydata$servers
servers2 = mydata$servers^2
quad_model <- lm(cost ~ servers + servers2, data = mydata)
summary (quad_model)
Call:
lm(formula = cost ~ servers + servers2, data = mydata)
Residuals:
Min 1Q Median 3Q Max
-2897.8 -1553.4 -513.2 1152.4 4752.7
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 35417.77 1742.64 20.32 2.30e-13 ***
servers -5589.43 382.19 -14.62 4.62e-11 ***
servers2 268.45 17.68 15.19 2.55e-11 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 2342 on 17 degrees of freedom
Multiple R-squared: 0.9314, Adjusted R-squared: 0.9233
F-statistic: 115.4 on 2 and 17 DF, p-value: 1.282e-10
Commands: predicted_2 <- predict( quad_model, data = mydata )
servers2 = servers^2
quad_model = lm(cost ~ servers + servers2 )
predicted2 = predict(quad_model,data=mydata)
Commands: qplot( x = DEPENDENT, y = INDEPENDENT/PREDICTED, colour = “red” )
qplot( x = servers, y = predicted2, colour = "red" )
servers <- mydata$servers
servers2 <- mydata$servers^2
servers3 <- mydata$servers^3
cubic_model <- lm(cost ~ servers + servers2 + servers3)
summary( cubic_model )
Call:
lm(formula = cost ~ servers + servers2 + servers3)
Residuals:
Min 1Q Median 3Q Max
-2871.0 -1435.1 -473.6 1271.8 4600.3
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36133.696 2625.976 13.760 2.77e-10 ***
servers -5954.738 1056.596 -5.636 3.72e-05 ***
servers2 310.895 115.431 2.693 0.016 *
servers3 -1.347 3.619 -0.372 0.715
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 2404 on 16 degrees of freedom
Multiple R-squared: 0.932, Adjusted R-squared: 0.9193
F-statistic: 73.11 on 3 and 16 DF, p-value: 1.478e-09
Multiple R-squared is .932 and adjusted R-squared is .9193, which indicates the model and variables have a strong relationship and they are a good fit for the data.
Commands: predicted3 <- predict( cubic_model, data = mydata )
predicted3 <- predict( cubic_model, data = mydata )
Commands: qplot( x = DEPENDENT, y = INDEPENDENT/PREDICTED, colour = “red” )
qplot( x = servers, y = predicted3, colour = "red" )
This model appears to have data points that are better suited to the model, because they follow a quandratic formula with little deviations. Furthermore, the values R-Squared and adjusted R-squared values are higher. ### 3E) Overlay the all models on top of the data. Which model seems to fit the best in your opinion? Justify your answer.
variables: LINEAR_MODEL , PREDICTED_QUADRATIC, PREDICTED_CUBIC
# Black = Actual Data
plot(servers, cost, pch = 16)
# Blue = Linear Line based on Linear Regression Model
abline(linear_model, col = "blue", lwd = 2)
# Red = Quadratic Model based on Quadratric Regression found above
# Needed to overlay new points without the labels and annotations
par(new = TRUE, xaxt = "n", yaxt = "n", ann = FALSE)
plot(predicted2, col = "red", pch = 16)
# Green = Cubic Model based on Cubic Regression found above
# Overlay new points without the labels and annotations
par(new = TRUE, xaxt = "n", yaxt = "n", ann = FALSE)
plot(predicted3, col = "green", pch = 16)
Model 3 appears to be the best fit for the data, as there is little variation between the actual and plotted data.