About

This project will explore the concepts of simple linear regression, multiple linear regression and finding the optimum solution with possible business limitations.

Setup

Remember to always set your working directory to the source file location. Go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Read carefully the below and follow the instructions to complete the tasks and answer any questions. Submit your work to RPubs as detailed in previous notes.


Task 1: Simple Linear Regression

First, read in the marketing data that was used in the previous lab. Make sure the file is read in correctly. (0.25 points)

#Read data correctly
marketing = read.csv(file = "marketing.csv")
head(marketing)
##   case_number sales radio paper  tv pos
## 1           1 11125    65    89 250 1.3
## 2           2 16121    73    55 260 1.6
## 3           3 16440    74    58 270 1.7
## 4           4 16876    75    82 270 1.3
## 5           5 13965    69    75 255 1.5
## 6           6 14999    70    71 255 2.1

Next, apply the cor() function to the data to understand the correlations between variables. This is a great way to compare the correlations between all variables.(0.25 points)

#Correlation matrix of all columns in the data
corr = cor(marketing)
corr
##             case_number      sales       radio       paper          tv
## case_number   1.0000000  0.2402344  0.23586825 -0.36838393  0.22282482
## sales         0.2402344  1.0000000  0.97713807 -0.28306828  0.95797025
## radio         0.2358682  0.9771381  1.00000000 -0.23835848  0.96609579
## paper        -0.3683839 -0.2830683 -0.23835848  1.00000000 -0.24587896
## tv            0.2228248  0.9579703  0.96609579 -0.24587896  1.00000000
## pos           0.0539763  0.0126486  0.06040209 -0.09006241 -0.03602314
##                     pos
## case_number  0.05397630
## sales        0.01264860
## radio        0.06040209
## paper       -0.09006241
## tv          -0.03602314
## pos          1.00000000
#Correlation matrix of all columns except the first column. This is convenient since case_number is only an indicator for the month and should be excluded from the calculations.
corr = cor( marketing[ c(2:6) ] )
corr
##            sales       radio       paper          tv         pos
## sales  1.0000000  0.97713807 -0.28306828  0.95797025  0.01264860
## radio  0.9771381  1.00000000 -0.23835848  0.96609579  0.06040209
## paper -0.2830683 -0.23835848  1.00000000 -0.24587896 -0.09006241
## tv     0.9579703  0.96609579 -0.24587896  1.00000000 -0.03602314
## pos    0.0126486  0.06040209 -0.09006241 -0.03602314  1.00000000
1A) Why the value of “1.0” along the diagonal? (0.25 points)

The value of 1 is along the diagonal because the values correlated with each other to produce the value of 1 are the exact same variable. ##### 1B) Which pairs has the strongest correlations? (0.25 points) Radio:sales, radio:tv, and tv:sales have the strongest correlations.

Next, create a visual diagram of the correlation matrix called a corrgram where the correlations strength are represented by colors intensity. To do this you need first to install two packages in R-Studio as executed by the command lines below

## corrplot 0.91 loaded
## Warning: package 'corrgram' was built under R version 4.0.5

Generate a corrgram and a corrplot for the computed correlaation matrix (0.25 points)

# Generates a corrgram of last computed correlation matrix
corrgram(corr)

# Generates a corrplot, similar a corrgram, but with a different visual display
corrplot(corr)

1C) Evaluate the correlation between the variables (0.5 points)

Now extract all the variables and use the scatter.smooth() function to plot radio expences and sales. (0.25 points)

#Extract all variables
pos  = marketing$pos
paper = marketing$paper
tv = marketing$tv
sales = marketing$sales
radio = marketing$radio
#Use scatter.smooth() function to plot Radio and Sales. 
scatter.smooth(radio,sales)

1D) How can you define the relationship between radio expences and sales?

You can define the relationship between radio expenses and sales by adding a trend line in a simple linear regression model.

Now, try to find a simple linear regression model between sales and marketing expences for radio commercials.(0.25 points)

#Simple Linear Regression 
reg <- lm(sales ~ radio)

#Summary of Model
summary(reg)
## 
## Call:
## lm(formula = sales ~ radio)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1732.85  -198.88    62.64   415.26   637.70 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -9741.92    1362.94  -7.148 1.17e-06 ***
## radio         347.69      17.83  19.499 1.49e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 571.6 on 18 degrees of freedom
## Multiple R-squared:  0.9548, Adjusted R-squared:  0.9523 
## F-statistic: 380.2 on 1 and 18 DF,  p-value: 1.492e-13

1E) What is the values for the intercept and the slope for radio? Write down the equation for the linear regression model and your interpretation for the intercept and slope.(0.5 points)

The slope for radio would be 347.69 and the y-intercept of radio would be -9471.92. Y= -9741.92 + 347.69x This equation means that for every “x” radio sold, sales increase by about $347.69. The y-intercept means that at 0 radios, the amount of money (sales) is -$9741.92.

Given this equation we can predict the value of sales for any given value of radio ads. Use your equation to calculate the predicted sales value for 75 (investing $75,000 in radio ads).(0.25 points)

### Sales_predicted  ~ Radio expences
sales_predicted = -9741.92 + 347.69 * (75)
sales_predicted
## [1] 16334.83

A high R-Squared value indicates that the model is a good fit, but not perfect. For the case of Sales versus Radio we will overlay the trend line representing the regression equation over the original plot. This will show how far the calculated results are from the actual value. The difference between the actual sales (circles) and the fitted sales (solid line) is captured in the residual error calculations.

#Plot Radio and Sales 
plot(radio,sales)

scatter.smooth(radio,sales)

#Add a trend line plot using the linear model we created above
abline(reg, col="blue",lwd=2) 

Now you can derive the linear regression model for Sales versus TV (0.25 points)

#Simple Linear Regression 
regtv <- lm(sales ~ tv)

#Summary of Model
summary(regtv)
## 
## Call:
## lm(formula = sales ~ tv)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1921.87  -412.24     7.02   581.59  1081.61 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -42229.21    4164.12  -10.14 7.19e-09 ***
## tv             221.10      15.61   14.17 3.34e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 771.3 on 18 degrees of freedom
## Multiple R-squared:  0.9177, Adjusted R-squared:  0.9131 
## F-statistic: 200.7 on 1 and 18 DF,  p-value: 3.336e-11
1F) Write down the equation for the linear regression model. Note the values for the intercept and the slope.(0.25 points)

y= 221.10x - 42229.21 The slope is 221.10 and the y-intercept is -$42,229.21.


Task 2: Multiple Linear Regression

Create a multiple linear regression predicting sales using both independent variables radio and tv. (0.25 points)

#Multiple Linear Regression Model
mlr1 <-lm(sales ~ radio + tv)

#Summary of Multiple Linear Regression Model
summary(mlr1)
## 
## Call:
## lm(formula = sales ~ radio + tv)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1729.58  -205.97    56.95   335.15   759.26 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17150.46    6965.59  -2.462 0.024791 *  
## radio          275.69      68.73   4.011 0.000905 ***
## tv              48.34      44.58   1.084 0.293351    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 568.9 on 17 degrees of freedom
## Multiple R-squared:  0.9577, Adjusted R-squared:  0.9527 
## F-statistic: 192.6 on 2 and 17 DF,  p-value: 2.098e-12
2A) Write down the equation for the linear regression model and your interpretation for the intercept and the slopes. (1 point)

y= 275.69(Radio) + 48.34(TV) - 17,150.46 For every radio, the sales will increase by $275.69 and for every TV, the sales will increase by $48.34. At 0 radios and 0 TVs, the amount of sales present is -$17,150.46.

The predicted sales can again be calculated given the coefficients of the regression model.

Calculate the predicted sales for TV = 270 and Radio = 75 (0.25 points)

# sales_predicted = radio + tv
sales_predicted = coef(mlr1)[1] + coef(mlr1)[2]*(75) + coef(mlr1)[3]*(270)
sales_predicted
## (Intercept) 
##     16578.3

Create a multiple linear regression model for each of the following, and display the summary statistics (0.25 points)

#Multiple regression model for sales predicted by radio, tv, and pos
mlr2 <-lm(sales ~ radio + tv + pos)


#Summary of Multiple Linear Regression Model
summary(mlr2)
## 
## Call:
## lm(formula = sales ~ radio + tv + pos)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1748.20  -187.42   -61.14   352.07   734.20 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -15491.23    7697.08  -2.013  0.06130 . 
## radio          291.36      75.48   3.860  0.00139 **
## tv              38.26      48.90   0.782  0.44538   
## pos           -107.62     191.25  -0.563  0.58142   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 580.7 on 16 degrees of freedom
## Multiple R-squared:  0.9585, Adjusted R-squared:  0.9508 
## F-statistic: 123.3 on 3 and 16 DF,  p-value: 2.859e-11
#Multiple regression model for sales predicted by radio, tv, pos, and paper
mlr3 <-lm(sales ~ radio + tv + pos + paper)


#Summary of Multiple Linear Regression Model
summary(mlr3)
## 
## Call:
## lm(formula = sales ~ radio + tv + pos + paper)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1558.13  -239.35     7.25   387.02   728.02 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -13801.015   7865.017  -1.755  0.09970 . 
## radio          294.224     75.442   3.900  0.00142 **
## tv              33.369     49.080   0.680  0.50693   
## pos           -128.875    192.156  -0.671  0.51262   
## paper           -9.159      8.991  -1.019  0.32449   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 580 on 15 degrees of freedom
## Multiple R-squared:  0.9612, Adjusted R-squared:  0.9509 
## F-statistic: 92.96 on 4 and 15 DF,  p-value: 2.13e-10
2B) Which of the three multiple linear regression models is best in predicting sales. Explain why. (0.5 points)

The first multiple linear regression is the best in predicting sales. This is because the models’ Adjusted R-squared is the highest of all of the 3 models at .9527, this is because the adjusted R-squared as close to 1 as possible means the strongest correlation to sales.

Calculate the sales for each of the three models given that Radio = 69 , TV = 255 , POS = 1.5, and Paper = 75. (0.25 points)

# sales_predicted = radio + tv + pos + paper 
mlr1= 275.69*69 + 48.34*255 - 17150.46
mlr1
## [1] 14198.85
mlr2= 291.36*69 + 38.26*255 - 107.62*1.5 - 15491.23
mlr2
## [1] 14207.48
mlr3= 294.224*69 + 33.369*255 - 128.875*1.5 - 9.159*75 - 13801.015
mlr3
## [1] 14129.3

###Task 3: Linear Programming & Optimization (total 3 points) For this task, we need to install an optimization package in R.

if(!require("lpSolveAPI",quietly = TRUE))
  install.packages("lpSolveAPI",dependencies = TRUE, repos = "https://cloud.r-project.org")
#install special package required for the solver
#install.packages("lpSolveAPI", repos = "https://cran.us.r-project.org") 
#install.packages("lpSolveAPI", repos = "https://cloud.r-project.org")
# load package library
library("lpSolveAPI") 

Solving Marketing Model(2 points)

First create the linear programming model object in R. This is the starting point. The object will eventually contain all definitions and results.

# Start creating the linear programming model
lpmark <- make.lp(0,2)
?make.lp
## starting httpd help server ... done

Next, define the type of optimization, set the objective function, and add the constraints to our model object.

# Define type of optimization,and dump the screen output into a variable `dump`
dump = lp.control(lpmark, sense="max")

# Set the objective function
set.objfn(lpmark, c(275.69, 48.34))

# add constraints

add.constraint(lpmark, c(1, 1), "<=", 350000)
add.constraint(lpmark, c(1, 0), ">", 15000)
add.constraint(lpmark, c(0, 1), ">", 75000)
add.constraint(lpmark, c(2, -1), "=", 0)
add.constraint(lpmark, c(1, 0), ">=", 0)
add.constraint(lpmark, c(0, 1), ">=", 0)

Finally solve the model.

# View the problem formulation in tabular/matrix form
lpmark
## Model name: 
##               C1      C2            
## Maximize  275.69   48.34            
## R1             1       1  <=  350000
## R2             1       0  >=   15000
## R3             0       1  >=   75000
## R4             2      -1   =       0
## R5             1       0  >=       0
## R6             0       1  >=       0
## Kind         Std     Std            
## Type        Real    Real            
## Upper        Inf     Inf            
## Lower          0       0
# Solve 
 solve(lpmark)
## [1] 0
# Display the objective function optimum value
get.objective(lpmark)
## [1] 43443167
# Display the decision variables optimum values
 get.variables(lpmark)
## [1] 116666.7 233333.3

3A) Write down and clearly mark the optimum values for sales, radio, and tv ads. Show how the optimum values satisfy all constraints (0.5 point)

Optimal Sales: $43443167 This satisfies all constraints because the Sales must be greater than $350,000, the sum of the optimal radio and TV Ads values, in order for a profit, in which the number for sales definitely exceeds this number.

Optimal Radio: $116666.70 This satisfies all constraints because the Radio must be greater than of equal to $15,000, which the answer exceeds.

Optimal TV Ads: $233333.3 This satisfies all constraints because the TV Ads must be greater than or equal to $75,000, which the answer exceeds.