Notebook Instructions
About
In this lab, we will focus on linear and non-linear programming.
Linear programming, as discussed in the previous lab, works with simple and multiple linear regression techniques; sometimes the variables have completely direct or completely non-direct relationships and these techniques can model them.
Sometimes, however, the variables do not predict each other in a linear way. For example, looking at the stock market vs. time, we know that generally the market was booming before the crash, then the market crashed and the great depression hit, and slowly the market started to rise again.
This pattern is not linear, and in fact a non-linear programming technique can be used to model it and predict the value of the market based on the year.
In this lab, we will explore topics like optimization, solve a marketing model, and perform linear and non-linear regression on the cost of servers.
Load Packages in R/RStudio
We are going to use tidyverse a collection of R packages designed for data science.
Loading required package: lpSolveAPI
there is no package called <U+393C><U+3E31>lpSolveAPI<U+393C><U+3E32>Installing package into <U+393C><U+3E31>C:/Users/fasha/OneDrive/Documents/R/win-library/3.4<U+393C><U+3E32>
(as <U+393C><U+3E31>lib<U+393C><U+3E32> is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/lpSolveAPI_5.5.2.0-17.zip'
Content type 'application/zip' length 1044053 bytes (1019 KB)
downloaded 1019 KB
package lpSolveAPI successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\fasha\AppData\Local\Temp\RtmpgvPb8H\downloaded_packages
Loading required package: tidyverse
[30m-- [1mAttaching packages[22m --------------------------------------- tidyverse 1.2.1 --[39m
[30m[32mv[30m [34mggplot2[30m 2.2.1 [32mv[30m [34mpurrr [30m 0.2.4
[32mv[30m [34mtibble [30m 1.4.2 [32mv[30m [34mdplyr [30m 0.7.4
[32mv[30m [34mtidyr [30m 0.8.0 [32mv[30m [34mstringr[30m 1.2.0
[32mv[30m [34mreadr [30m 1.1.1 [32mv[30m [34mforcats[30m 0.2.0[39m
[30m-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[30m [34mdplyr[30m::[32mfilter()[30m masks [34mstats[30m::filter()
[31mx[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()[39m
Loading required package: rvest
Loading required package: xml2
Attaching package: <U+393C><U+3E31>rvest<U+393C><U+3E32>
The following object is masked from <U+393C><U+3E31>package:purrr<U+393C><U+3E32>:
pluck
The following object is masked from <U+393C><U+3E31>package:readr<U+393C><U+3E32>:
guess_encoding
Task 1: Linear Programming - Solving Marketing Model
1A) Create the model object in R.
lprec <- make.lp(0, 2)
Set the constrains and objective function for the model.
lp.control(lprec, sense="max")
$anti.degen
[1] "fixedvars" "stalling"
$basis.crash
[1] "none"
$bb.depthlimit
[1] -50
$bb.floorfirst
[1] "automatic"
$bb.rule
[1] "pseudononint" "greedy" "dynamic" "rcostfixing"
$break.at.first
[1] FALSE
$break.at.value
[1] 1e+30
$epsilon
epsb epsd epsel epsint epsperturb epspivot
1e-10 1e-09 1e-12 1e-07 1e-05 2e-07
$improve
[1] "dualfeas" "thetagap"
$infinite
[1] 1e+30
$maxpivot
[1] 250
$mip.gap
absolute relative
1e-11 1e-11
$negrange
[1] -1e+06
$obj.in.basis
[1] TRUE
$pivoting
[1] "devex" "adaptive"
$presolve
[1] "none"
$scalelimit
[1] 5
$scaling
[1] "geometric" "equilibrate" "integers"
$sense
[1] "maximize"
$simplextype
[1] "dual" "primal"
$timeout
[1] 0
$verbose
[1] "neutral"
set.objfn(lprec, c(275.691, 48.341))
1B) Add constrains
add.constraint(lprec, c(1, 1), "<=", 350000)
add.constraint(lprec, c(1, 0), ">=", 15000)
add.constraint(lprec, c(0, 1), ">=", 75000)
add.constraint(lprec, c(2, -1), "=", 0)
1C) Solve the optimization problem
# solve
solve(lprec)
[1] 0
Display the objective function optimum value
get.objective(lprec)
[1] 43443517
Display the variables optimum values
get.variables(lprec)
[1] 116666.7 233333.3
Task 2: Regression Analysis - Linear Regression
- A linear model is of the form y = x0 + x1 + …+ x_n
2A) Read the csv file into R Studio and display the dataset.
mydata <- read.csv(file="data/ServersCost.csv")
mydata
head(mydata)
2B) Create a correlation table for your to compare the correlations between all variables. What can you tell about the correlation between the variables.
cor(mydata)
servers cost
servers 1.00000000 0.03356606
cost 0.03356606 1.00000000
The points make a U shape going from a negative slope to a positive one. The trend line shows the best fit for the data. However, this data doesn’t have a very good fit and doesn’t go through a single point.
2D) Create a linear regression model by identifying the dependent variable (y) and independent variable (x_n)
- Commands: linear_model <- lm( DEPENDENT ~ INDEPENDENT )
linear_model <- lm( cost ~ servers )
linear_model
Call:
lm(formula = cost ~ servers)
Coefficients:
(Intercept) servers
14747 48
Use the regression model to create a report. Note the R-Squared and Adjusted R-Squared values, determine if this is a good or bad fit for your data?
- Commands: summary( linear_model )
summary( linear_model )
Call:
lm(formula = cost ~ servers)
Residuals:
Min 1Q Median 3Q Max
-10646.2 -8646.2 -544.7 7066.0 12858.8
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14747.2 4035.5 3.654 0.00181 **
servers 48.0 336.9 0.142 0.88828
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 8687 on 18 degrees of freedom
Multiple R-squared: 0.001127, Adjusted R-squared: -0.05437
F-statistic: 0.0203 on 1 and 18 DF, p-value: 0.8883
## R-squared is .001127 and the adjusted is -0.05437. This shows that this is not a very good fit for the data. It’s better the higher the number becomes and this is a low number.
Task 3: Regression Analysis - Non-linear Regression
Both the r-squared and adjusted are 1 which would lead people to believe that this is a perfect fit.
3B) Compute the predicted values based on the quadratic model.
Commands: predicted_2 <- predict( quad_model, data = mydata )
servers2 = servers^2
quad_model = lm(cost ~ servers + servers2 )
quad_model
Call:
lm(formula = cost ~ servers + servers2)
Coefficients:
(Intercept) servers servers2
35417.8 -5589.4 268.4
predicted2 = predict(quad_model,data=mydata)
predicted2
1 2 3 4 5 6 7 8
30096.790 25312.706 21065.520 17355.233 14181.844 11545.354 9445.762 7883.068
9 10 11 12 13 14 15 16
6857.273 6368.376 6416.377 7001.277 8123.076 9781.772 11977.367 14709.861
17 18 19 20
17979.252 21785.543 26128.731 31008.818
Create a plot using the quadratic model predicted values in color red. Noted the shape, looking at the plot is this a good or bad fit for your data?
Commands: qplot( x = DEPENDENT, y = INDEPENDENT/PREDICTED, colour = “red” )
qplot( x = servers, y = predicted2, colour = "red" )

The shape is a perfect U which seems to be a better fit for the data as it seems more evenly distributed.
R-squared is .932 and the adjusted is 0.9193 which is pretty close to 1 which leads us to believe that this a good fit for the data.
3D) Compute the predicted values based on the cubic model.
Commands: predicted3 <- predict( cubic_model, data = mydata )
predicted3 <- predict( cubic_model, data = mydata )
predicted3
1 2 3 4 5 6 7 8
30488.507 25457.022 21031.159 17202.831 13963.954 11306.443 9222.212 7703.177
9 10 11 12 13 14 15 16
6741.253 6328.355 6456.398 7117.297 8302.966 10005.322 12216.278 14927.751
17 18 19 20
18131.654 21819.904 25984.414 30617.101
Create a plot using the cubic model predicted values in color green. Noted the shape, looking at the plot is this a good or bad fit for your data? Is this model better than the previous?
Commands: qplot( x = DEPENDENT, y = INDEPENDENT/PREDICTED, colour = “red” )
qplot( x = servers, y = predicted3, colour = "green" )

This model seems evenly distributed which makes it look like it will be a good fit for the data.
3E) Overlay the all models on top of the data. Which model seems to fit the best in your opinion? Justify your answer.
variables: LINEAR_MODEL , PREDICTED_QUADRATIC, PREDICTED_CUBIC
# Black = Actual Data
plot(servers, cost, pch = 16)
# Blue = Linear Line based on Linear Regression Model
abline(linear_model, col = "blue", lwd = 2)
# Red = Quadratic Model based on Quadratric Regression found above
# Needed to overlay new points without the labels and annotations
par(new = TRUE, xaxt = "n", yaxt = "n", ann = FALSE)
plot(predicted2, col = "red", pch = 16)
# Green = Cubic Model based on Cubic Regression found above
# Overlay new points without the labels and annotations
par(new = TRUE, xaxt = "n", yaxt = "n", ann = FALSE)
plot(predicted3, col = "green", pch = 16)

The quadratic model looks the best because the trend line goes directly through two points where as it only goes directly through one poin for the cubic graph and none for the linear.
---
title: "Business Analytics Lab Worksheet 06"
author: "Ashley Krenz"
date: "March 29, 2018"
output:
  html_notebook: default
  html_document: default
  pdf_document: default
subtitle: CME Group Foundation Business Analytics Lab
---

-------------

## Notebook Instructions

-------------

### About

* In this lab, we will focus on linear and non-linear programming. 

* Linear programming, as discussed in the previous lab, works with simple and multiple linear regression techniques; sometimes the variables have completely direct or completely non-direct relationships and these techniques can model them.

* Sometimes, however, the variables do not predict each other in a linear way. For example, looking at the stock market vs. time, we know that generally the market was booming before the crash, then the market crashed and the great depression hit, and slowly the market started to rise again. 

* This pattern is not linear, and in fact a non-linear programming technique can be used to model it and predict the value of the market based on the year. 

* In this lab, we will explore topics like optimization, solve a marketing model, and perform linear and non-linear regression on the cost of servers.


### Load Packages in R/RStudio 

We are going to use tidyverse a collection of R packages designed for data science. 

* Info: https://www.tidyverse.org/

```{r, echo = FALSE}

# Here we are checking if the package is installed
if(!require("lpSolveAPI")){
  
  # If the package is not in the system then it will be install
  install.packages("lpSolveAPI", dependencies = TRUE)
  
  # Here we are loading the package
  library("lpSolveAPI")
}

```

```{r, echo = FALSE}

# Here we are checking if the package is installed
if(!require("tidyverse")){
  
  # If the package is not in the system then it will be install
  install.packages("tidyverse", dependencies = TRUE)
  
  # Here we are loading the package
  library("tidyverse")
}

# Here we are checking if the package is installed
if(!require("rvest")){
  
  # If the package is not in the system then it will be install
  install.packages("rvest", dependencies = TRUE)
  
  # Here we are loading the package
  library("rvest")
}

```

-------------

## Task 1: Linear Programming - Solving Marketing Model

-------------

### 1A) Create the model object in R.

```{r}

lprec <- make.lp(0, 2) 

```

#### Set the constrains and objective function for the model.

* Set for maximum
```{r}

lp.control(lprec, sense="max")  
set.objfn(lprec, c(275.691, 48.341))

```

### 1B) Add constrains

```{r}

add.constraint(lprec, c(1, 1), "<=", 350000)
add.constraint(lprec, c(1, 0), ">=", 15000)
add.constraint(lprec, c(0, 1), ">=", 75000)
add.constraint(lprec, c(2, -1), "=", 0)

```

#### View the problem formulation in tabular/matrix form to confirm that the model was created correctly.

```{r}

lprec

```

### 1C) Solve the optimization problem
```{r}
# solve 
solve(lprec) 

```

#### Display the objective function optimum value
```{r}

get.objective(lprec)

```

#### Display the variables optimum values
```{r}

get.variables(lprec) 

```

-------------

## Task 2: Regression Analysis - Linear Regression

-------------

* A linear model is of the form y = x0 + x1 + ...+ x_n

### 2A) Read the csv file into R Studio and display the dataset. 

* Name your dataset 'mydata' so it easy to work with.

* Commands: read_csv() head()

```{r}
mydata <- read.csv(file="data/ServersCost.csv")
mydata
```
```{r}
head(mydata)
```


#### Extract the assigned features (columns) to perform some analytics. 
```{r}
servers <- mydata$servers
cost <- mydata$cost
```

### 2B) Create a correlation table for your to compare the correlations between all variables. What can you tell about the correlation between the variables.  

```{r}
cor(mydata)
```
## The two variables are positively correlated which means they move together. However it's a small positive correlation. 

### 2C) Create a plot for the dependent (y) and independent (x) variables. Note any patterns or relation between the two variables describe the trend line.

* The blue line here represents the linear model we created and the black dots are the data points. 

Commands: p <- qplot( x = INDEPENDENT, y = DEPENDENT, data = mydata) + geom_point()

```{r}
p <- qplot( x = servers, y = cost, data = mydata) + geom_point()
p
```

Commmand: p + geom_smooth(method = "lm")

#### Add a trend line plot using the a linear model
```{r}
p + geom_smooth(method = "lm")
```
## The points make a U shape going from a negative slope to a positive one. The trend line shows the best fit for the data. However, this data doesn't have a very good fit and doesn't go through a single point. 

### 2D) Create a linear regression model by identifying the dependent variable (y) and independent variable (x_n)

* Commands: linear_model <- lm( DEPENDENT ~ INDEPENDENT ) 

```{r}

linear_model <- lm( cost ~ servers ) 
linear_model

```

#### Use the regression model to create a report. Note the R-Squared and Adjusted R-Squared values, determine if this is a good or bad fit for your data?

* Commands: summary( linear_model )

```{r}
 summary( linear_model )
```
## R-squared is .001127 and the adjusted is -0.05437. This shows that this is not a very good fit for the data. It's better the higher the number becomes and this is a low number. 
-------------

## Task 3: Regression Analysis - Non-linear Regression

-------------

* We use a transformation and use a nonlinear quadratic model to see how the model fits to the data.

* A quadratic model transforms the predictor by squaring it and adding to the model. 
* Quadratic Model: y = x + x^2

### 3A) Create a non-linear quadratic regression model by identifying the dependent variable (y) and independent variables (x). Transforms the independent variable by squaring it and adding to the model. 

* The Quadratic model formula is: y = x + x^2
* Commands: quad_model <- lm(y ~ x + x_squared)
* Commands: To squared a variable use (^) such as  x^2

```{r}
x = mydata$servers
x2 = mydata$servers^2
y = x + x2 
quad_model <- lm(y ~ x + x2)
quad_model
```

#### Use the quadratic model to create a report. Note the R-Squared and Adjusted R-Squared values, determine if this is a good or bad fit for your data?

* Commands: summary( quad_model )

```{r}
summary( quad_model )
```

## Both the r-squared and adjusted are 1 which would lead people to believe that this is a perfect fit. 

### 3B) Compute the predicted values based on the quadratic model.

Commands: predicted_2 <- predict( quad_model, data = mydata )

```{r}
servers2 = servers^2

quad_model = lm(cost ~ servers + servers2 )
quad_model

predicted2 = predict(quad_model,data=mydata)
predicted2

```

#### Create a plot using the quadratic model predicted values in color red. Noted the shape, looking at the plot is this a good or bad fit for your data?

Commands: qplot( x = DEPENDENT, y = INDEPENDENT/PREDICTED, colour = "red" )

```{r}
qplot( x = servers, y = predicted2, colour = "red" )

```
## The shape is a perfect U which seems to be a better fit for the data as it seems more evenly distributed. 

### 3C) Create a non-linear cubic regression model by identifying the dependent variable (y) and independent variables (x). Transforms the independent variable by squaring it to second (x^2) and third )x^3) degrees and adding them to the model. 

* The Cubic model formula is: y = x + x^2 + x^3
* Commands: cubic_model <- lm(y ~ x + x_squared + x_cubic)
* Commands: To squared a variable use (^) such as  x^2, x^3

```{r}
servers <- mydata$servers
servers2 <- mydata$servers^2
servers3 <- mydata$servers^3
servers4 <- mydata$servers^4
servers5 <- mydata$servers^5

cubic_model <- lm(cost ~ servers + servers2 + servers3)
cubic_model
```

#### Use the cubic model to create a report. Note the R-Squared and Adjusted R-Squared values, determine if this is a good or bad fit for your data?

* Commands: summary( cubic_model )

```{r}
summary( cubic_model )
```
## R-squared is .932 and the adjusted is 0.9193 which is pretty close to 1 which leads us to believe that this a good fit for the data. 

### 3D) Compute the predicted values based on the cubic model.

Commands: predicted3 <- predict( cubic_model, data = mydata )

```{r}
predicted3 <- predict( cubic_model, data = mydata )
predicted3
```

#### Create a plot using the cubic model predicted values in color green. Noted the shape, looking at the plot is this a good or bad fit for your data? Is this model better than the previous?

Commands: qplot( x = DEPENDENT, y = INDEPENDENT/PREDICTED, colour = "red" )

```{r}
qplot( x = servers, y = predicted3, colour = "green" )

```

## This model seems evenly distributed which makes it look like it will be a good fit for the data. 

### 3E) Overlay the all models on top of the data. Which model seems to fit the best in your opinion? Justify your answer. 

variables: LINEAR_MODEL , PREDICTED_QUADRATIC, PREDICTED_CUBIC

```{r}

# Black = Actual Data
plot(servers, cost, pch = 16) 
# Blue = Linear Line based on Linear Regression Model
abline(linear_model, col = "blue", lwd = 2) 

# Red = Quadratic Model based on Quadratric Regression found above
# Needed to overlay new points without the labels and annotations
par(new = TRUE, xaxt = "n", yaxt = "n", ann = FALSE) 
plot(predicted2, col = "red", pch = 16) 

# Green = Cubic Model based on Cubic Regression found above
# Overlay new points without the labels and annotations 
par(new = TRUE, xaxt = "n", yaxt = "n", ann = FALSE) 
plot(predicted3, col = "green", pch = 16)

```
## The quadratic model looks the best because the trend line goes directly through two points where as it only goes directly through one poin for the cubic graph and none for the linear. 
