Business Analytics Lab Project 2

About

This project will explore the concepts of simple linear regression, multiple linear regression and finding the optimum solution with possible business limitations.

Setup

Remember to always set your working directory to the source file location. Go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Read carefully the below and follow the instructions to complete the tasks and answer any questions.

Submit your html, pdf or word output to Sakai, If you want you can also publish your work into rpubs and share the link on Sakai. If you worked as a team you can have 1 submission with all the names of the team on it.

Task 1: Simple Linear Regression (Total 2.5 points)

First, read in the marketing data that was used in the previous lab. Make sure the file is read in correctly. (0.25 points)

#Read data correctly
traindata = read.csv(file="data/marketing.csv", header=TRUE)
head(traindata)

##   case_number sales radio paper  tv pos
## 1           1 11125    65    89 250 1.3
## 2           2 16121    73    55 260 1.6
## 3           3 16440    74    58 270 1.7
## 4           4 16876    75    82 270 1.3
## 5           5 13965    69    75 255 1.5
## 6           6 14999    70    71 255 2.1

Next, apply the cor() function to the data to understand the correlations between variables. This is a great way to compare the correlations between all variables.(0.25 points)

#Correlation matrix of all columns in the data
corr=cor(traindata)
corr

##             case_number      sales       radio       paper          tv
## case_number   1.0000000  0.2402344  0.23586825 -0.36838393  0.22282482
## sales         0.2402344  1.0000000  0.97713807 -0.28306828  0.95797025
## radio         0.2358682  0.9771381  1.00000000 -0.23835848  0.96609579
## paper        -0.3683839 -0.2830683 -0.23835848  1.00000000 -0.24587896
## tv            0.2228248  0.9579703  0.96609579 -0.24587896  1.00000000
## pos           0.0539763  0.0126486  0.06040209 -0.09006241 -0.03602314
##                     pos
## case_number  0.05397630
## sales        0.01264860
## radio        0.06040209
## paper       -0.09006241
## tv          -0.03602314
## pos          1.00000000

#Correlation matrix of all columns except the first column. This is convenient since case_number is only an indicator for the month and should be excluded from the calculations.
corr=cor(traindata[2:6])
corr

##            sales       radio       paper          tv         pos
## sales  1.0000000  0.97713807 -0.28306828  0.95797025  0.01264860
## radio  0.9771381  1.00000000 -0.23835848  0.96609579  0.06040209
## paper -0.2830683 -0.23835848  1.00000000 -0.24587896 -0.09006241
## tv     0.9579703  0.96609579 -0.24587896  1.00000000 -0.03602314
## pos    0.0126486  0.06040209 -0.09006241 -0.03602314  1.00000000

Q 1A) Why the value of “1.0” along the diagonal? (0.25 points)

A 1A) because that is where the variables are correlated to themselves

Q 1B) Evaluate the correlation between the variables (0.25 points)

A 1B) in this the sale and radio have a high correlation which the other two don’t do as well

Now, try to find a simple linear regression model between sales and marketing expences for radio commercials.(0.25 points)

#Simple Linear Regression 
pos  = traindata$pos
paper = traindata$paper
tv = traindata$tv
sales = traindata$sales
sales = traindata$sales
radio = traindata$radio

reg <- lm(sales ~ radio)
reg

## 
## Call:
## lm(formula = sales ~ radio)
## 
## Coefficients:
## (Intercept)        radio  
##     -9741.9        347.7

#Summary of Model

Q 1C) What is the values for the intercept and the slope for radio? Write down the equation for the linear regression model and your interpretation for the intercept and slope.(0.5 points)

A 1C) intercept is -9741.9 slope is 347.7. equation is y = 347.7(x) - 9741.9

Given this equation we can predict the value of sales for any given value of radio ads. Use your equation to calculate the predicted sales value for 75 (investing $75,000 in radio ads).(0.25 points)

sales_predicted = -9741.92 + 347.69 * (75)
sales_predicted

## [1] 16334.83

### Sales_predicted  ~ Radio expences

A high R-Squared value indicates that the model is a good fit, but not perfect. For the case of Sales versus Radio we will overlay the trend line representing the regression equation over the original plot. This will show how far the calculated results are from the actual value. The difference between the actual sales (circles) and the fitted sales (solid line) is captured in the residual error calculations.

#Plot Radio and Sales 
plot(radio,sales)

scatter.smooth(radio,sales)

#Add a trend line plot using the linear model we created above
abline(reg, col="blue",lwd=2)

Now you can derive the linear regression model for Sales versus TV (0.25 points)

#Simple Linear Regression 
reg <- lm(sales ~ tv)
reg

## 
## Call:
## lm(formula = sales ~ tv)
## 
## Coefficients:
## (Intercept)           tv  
##    -42229.2        221.1

#Summary of Model
summ <- summary(reg)
summ

## 
## Call:
## lm(formula = sales ~ tv)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1921.87  -412.24     7.02   581.59  1081.61 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -42229.21    4164.12  -10.14 7.19e-09 ***
## tv             221.10      15.61   14.17 3.34e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 771.3 on 18 degrees of freedom
## Multiple R-squared:  0.9177, Adjusted R-squared:  0.9131 
## F-statistic: 200.7 on 1 and 18 DF,  p-value: 3.336e-11

Q 1D) Write down the equation for the linear regression model.(0.25 points)

A 1D) y = 221.10 - 42229.21

Task 2: Multiple Linear Regression (Total 1.5 points)

Create a multiple linear regression predicting sales using both independent variables radio and tv. (0.25 points)

#Multiple Linear Regression Model
mlr1 <- lm(sales ~ radio + tv)
mlr1

## 
## Call:
## lm(formula = sales ~ radio + tv)
## 
## Coefficients:
## (Intercept)        radio           tv  
##   -17150.46       275.69        48.34

#Summary of Multiple Linear Regression Model
summ = summary(mlr1)
summ

## 
## Call:
## lm(formula = sales ~ radio + tv)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1729.58  -205.97    56.95   335.15   759.26 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17150.46    6965.59  -2.462 0.024791 *  
## radio          275.69      68.73   4.011 0.000905 ***
## tv              48.34      44.58   1.084 0.293351    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 568.9 on 17 degrees of freedom
## Multiple R-squared:  0.9577, Adjusted R-squared:  0.9527 
## F-statistic: 192.6 on 2 and 17 DF,  p-value: 2.098e-12

Q 2A) Write down the equation for the linear regression model and your interpretation for the intercept and the slopes. (0.5 points)

A 2A) y = 275.69(x) + 48.34(z) -17150.46

The predicted sales can again be calculated given the coefficients of the regression model.

Calculate the predicted sales for TV = 270 and Radio = 75 (0.25 points)

# sales_predicted = radio + tv
sales_predicted = coef(mlr1)[1] + coef(mlr1)[2]*(75) + coef(mlr1)[3]*(270)
sales_predicted

## (Intercept) 
##     16578.3

Create a multiple linear regression model for each of the following, and display the summary statistics (0.25 points)

#Multiple regression model for sales predicted by radio, tv, and pos
mlr1 <- lm( sales ~ radio + tv + pos)
mlr1

## 
## Call:
## lm(formula = sales ~ radio + tv + pos)
## 
## Coefficients:
## (Intercept)        radio           tv          pos  
##   -15491.23       291.36        38.26      -107.62

#Summary of Multiple Linear Regression Model
summ = summary(mlr1)
summ

## 
## Call:
## lm(formula = sales ~ radio + tv + pos)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1748.20  -187.42   -61.14   352.07   734.20 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -15491.23    7697.08  -2.013  0.06130 . 
## radio          291.36      75.48   3.860  0.00139 **
## tv              38.26      48.90   0.782  0.44538   
## pos           -107.62     191.25  -0.563  0.58142   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 580.7 on 16 degrees of freedom
## Multiple R-squared:  0.9585, Adjusted R-squared:  0.9508 
## F-statistic: 123.3 on 3 and 16 DF,  p-value: 2.859e-11

#Multiple regression model for sales predicted by radio, tv, pos, and paper

mlr2 <- lm( sales ~ radio + tv + pos + paper)
mlr2

## 
## Call:
## lm(formula = sales ~ radio + tv + pos + paper)
## 
## Coefficients:
## (Intercept)        radio           tv          pos        paper  
##  -13801.015      294.224       33.369     -128.875       -9.159

#Summary of Multiple Linear Regression Model
summ = summary(mlr2)
summ

## 
## Call:
## lm(formula = sales ~ radio + tv + pos + paper)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1558.13  -239.35     7.25   387.02   728.02 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -13801.015   7865.017  -1.755  0.09970 . 
## radio          294.224     75.442   3.900  0.00142 **
## tv              33.369     49.080   0.680  0.50693   
## pos           -128.875    192.156  -0.671  0.51262   
## paper           -9.159      8.991  -1.019  0.32449   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 580 on 15 degrees of freedom
## Multiple R-squared:  0.9612, Adjusted R-squared:  0.9509 
## F-statistic: 92.96 on 4 and 15 DF,  p-value: 2.13e-10

Q 2B) Which of the three multiple linear regression models is best in predicting sales. Explain why. (0.25 points)

A 2B) the first one as it has the highest adjusted R squared

###Task 3: Linear Programming & Optimization (total 4 points) For this task, we need to install an optimization package in R.

if(!require("lpSolveAPI",quietly = TRUE))
  install.packages("lpSolveAPI",dependencies = TRUE, repos = "https://cloud.r-project.org")
#install special package required for the solver
#install.packages("lpSolveAPI", repos = "https://cran.us.r-project.org") 
#install.packages("lpSolveAPI", repos = "https://cloud.r-project.org")

# load package library
library("lpSolveAPI")

Solving Marketing Model

First create the linear programming model object in R. This is the starting point. The object will eventually contain all definitions and results. (0.25 points)

# Start creating the linear programming model
lpmark <- make.lp(0,2)

Next, define the type of optimization, set the objective function, and add the constraints to our model object.(0.5 points)

# Define type of optimization,and dump the screen output into a variable `dump`
dump = lp.control(lpmark, sense="max")


# Set the objective function
set.objfn(lpmark, c(275.691, 48.341))

# add constraints
add.constraint(lpmark, c(1, 1), "<=", 350000)
add.constraint(lpmark, c(1, 0), ">", 15000)
add.constraint(lpmark, c(0, 1), ">", 75000)
add.constraint(lpmark, c(2, -1), "=", 0)
add.constraint(lpmark, c(1, 0), ">=", 0)
add.constraint(lpmark, c(0, 1), ">=", 0)

Finally solve the model. (0.5 points)

# View the problem formulation in tabular/matrix form
lpmark

## Model name: 
##                C1       C2            
## Maximize  275.691   48.341            
## R1              1        1  <=  350000
## R2              1        0  >=   15000
## R3              0        1  >=   75000
## R4              2       -1   =       0
## R5              1        0  >=       0
## R6              0        1  >=       0
## Kind          Std      Std            
## Type         Real     Real            
## Upper         Inf      Inf            
## Lower           0        0

# Solve 
 solve(lpmark)

## [1] 0

# Display the objective function optimum value
get.objective(lpmark)

## [1] 43443517

# Display the decision variables optimum values
 get.variables(lpmark)

## [1] 116666.7 233333.3

# Display the constraints
get.constraints(lpmark)

## [1] 350000.0 116666.7 233333.3      0.0 116666.7 233333.3

Q 3A) Write down and clearly mark the optimum values for sales, radio, and tv ads. Show how the optimum values satisfy all constraints (0.5 points)

A 3A) the optimal amount for sales would be 350000 the optimal amount for radio would be 116666.72 for tv it would be 23333.3

Display the sensitivity of the coefficiants(0.25 points)

# display sensitivity to coefficients of objective function. 
get.sensitivity.obj(lpmark)

## $objfrom
## [1]  -96.6820 -137.8455
## 
## $objtill
## [1] 1e+30 1e+30

Q 3B) Explain in coincise manner what the sensitivity results represent in reference to the marketing model.(1 points)

A 3B) this means that the the low end of the radio would be -96.682 and the low end of tv would be -137.8455. anything beneath those would cause the optimal function to change

Display the sensitivity to right hand side constraints (0.25 points)

# display sensitivity to right hand side constraints. 
# There will be a total of m+n values where m is the number of contraints and n is the number of decision variables
get.sensitivity.rhs(lpmark)

## $duals
## [1] 124.12433   0.00000   0.00000  75.78333   0.00000   0.00000   0.00000
## [8]   0.00000
## 
## $dualsfrom
## [1]  1.125e+05 -1.000e+30 -1.000e+30 -3.050e+05 -1.000e+30 -1.000e+30 -1.000e+30
## [8] -1.000e+30
## 
## $dualstill
## [1] 1.00e+30 1.00e+30 1.00e+30 4.75e+05 1.00e+30 1.00e+30 1.00e+30 1.00e+30

Q 3C) For this exercise we are only interested in the first part of the output labeled `duals`. Explain in coincise manner what the two non-zero sensitivity results represent. Distinguish the binding/non-binding constraints, the surplus/slack, and marginal values.(1 points)

A 3C)the two non zero sensitivity results mean that those two variables are binding, and that if you were to increase the budget of those variabes by one dollar you would see an increase of the amount shown. The bindign constraints are the variables that are not equal to zero, the nonbinding ones are the ones that are equal to zero. If they are a non binding constraint that meanst that they have a surplus of the material and that any change to their budget, within the accepted range, would not have any change to the optimal amount. The marginal value is the value that you would get from increasing the budget by one dollar.

Business Analytics Lab Project 2

Predictive Analytics

Joseph Johnson

Date Here

About

Setup

Task 1: Simple Linear Regression (Total 2.5 points)

Q 1A) Why the value of “1.0” along the diagonal? (0.25 points)

A 1A) because that is where the variables are correlated to themselves

Q 1B) Evaluate the correlation between the variables (0.25 points)

A 1B) in this the sale and radio have a high correlation which the other two don’t do as well

Q 1C) What is the values for the intercept and the slope for radio? Write down the equation for the linear regression model and your interpretation for the intercept and slope.(0.5 points)

A 1C) intercept is -9741.9 slope is 347.7. equation is y = 347.7(x) - 9741.9

Q 1D) Write down the equation for the linear regression model.(0.25 points)

A 1D) y = 221.10 - 42229.21

Task 2: Multiple Linear Regression (Total 1.5 points)

Q 2A) Write down the equation for the linear regression model and your interpretation for the intercept and the slopes. (0.5 points)

A 2A) y = 275.69(x) + 48.34(z) -17150.46

Q 2B) Which of the three multiple linear regression models is best in predicting sales. Explain why. (0.25 points)

A 2B) the first one as it has the highest adjusted R squared

Solving Marketing Model

Q 3A) Write down and clearly mark the optimum values for sales, radio, and tv ads. Show how the optimum values satisfy all constraints (0.5 points)

A 3A) the optimal amount for sales would be 350000 the optimal amount for radio would be 116666.72 for tv it would be 23333.3

Q 3B) Explain in coincise manner what the sensitivity results represent in reference to the marketing model.(1 points)

A 3B) this means that the the low end of the radio would be -96.682 and the low end of tv would be -137.8455. anything beneath those would cause the optimal function to change

Q 3C) For this exercise we are only interested in the first part of the output labeled `duals`. Explain in coincise manner what the two non-zero sensitivity results represent. Distinguish the binding/non-binding constraints, the surplus/slack, and marginal values.(1 points)

Business Analytics Lab Project 2

Predictive Analytics

Joseph Johnson

Date Here

About

Setup

Task 1: Simple Linear Regression (Total 2.5 points)

Q 1A) Why the value of “1.0” along the diagonal? (0.25 points)

A 1A) because that is where the variables are correlated to themselves

Q 1B) Evaluate the correlation between the variables (0.25 points)

A 1B) in this the sale and radio have a high correlation which the other two don’t do as well

Q 1C) What is the values for the intercept and the slope for radio? Write down the equation for the linear regression model and your interpretation for the intercept and slope.(0.5 points)

A 1C) intercept is -9741.9 slope is 347.7. equation is y = 347.7(x) - 9741.9

Q 1D) Write down the equation for the linear regression model.(0.25 points)

A 1D) y = 221.10 - 42229.21

Task 2: Multiple Linear Regression (Total 1.5 points)

Q 2A) Write down the equation for the linear regression model and your interpretation for the intercept and the slopes. (0.5 points)

A 2A) y = 275.69(x) + 48.34(z) -17150.46

Q 2B) Which of the three multiple linear regression models is best in predicting sales. Explain why. (0.25 points)

A 2B) the first one as it has the highest adjusted R squared

Solving Marketing Model

Q 3A) Write down and clearly mark the optimum values for sales, radio, and tv ads. Show how the optimum values satisfy all constraints (0.5 points)

A 3A) the optimal amount for sales would be 350000 the optimal amount for radio would be 116666.72 for tv it would be 23333.3

Q 3B) Explain in coincise manner what the sensitivity results represent in reference to the marketing model.(1 points)

A 3B) this means that the the low end of the radio would be -96.682 and the low end of tv would be -137.8455. anything beneath those would cause the optimal function to change

Q 3C) For this exercise we are only interested in the first part of the output labeled duals. Explain in coincise manner what the two non-zero sensitivity results represent. Distinguish the binding/non-binding constraints, the surplus/slack, and marginal values.(1 points)

Q 3C) For this exercise we are only interested in the first part of the output labeled `duals`. Explain in coincise manner what the two non-zero sensitivity results represent. Distinguish the binding/non-binding constraints, the surplus/slack, and marginal values.(1 points)