Task 1

Marketing Data

#Read data correctly
mydata = read.csv(file="data/marketing.csv")
head(mydata)

Correlation Matrix

#Correlation Matrix
corr = cor(mydata [c(2,4,6)])
corr=cor(mydata [2:6])
corr
           sales       radio       paper          tv         pos
sales  1.0000000  0.97713807 -0.28306828  0.95797025  0.01264860
radio  0.9771381  1.00000000 -0.23835848  0.96609579  0.06040209
paper -0.2830683 -0.23835848  1.00000000 -0.24587896 -0.09006241
tv     0.9579703  0.96609579 -0.24587896  1.00000000 -0.03602314
pos    0.0126486  0.06040209 -0.09006241 -0.03602314  1.00000000
#install.packages("corrplot")
#install.packages("corrgram")
#library(corrgram)
#library(corrplot)
corrplot(corr)
Error in corrplot(corr) : could not find function "corrplot"

ScatterPlot

#Extract all variables
pos  = mydata$pos
paper = mydata$paper
tv = mydata$tv
sales = mydata$sales
radio = mydata$radio
#Plot of Radio and Sales using plot command from Worksheet 4
plot(radio,sales)

Linear Regression

#Simple Linear Regression
reg <- lm(sales ~ radio)
#Summary of Model
summary(reg)

Call:
lm(formula = sales ~ radio)

Residuals:
     Min       1Q   Median       3Q      Max 
-1732.85  -198.88    62.64   415.26   637.70 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -9741.92    1362.94  -7.148 1.17e-06 ***
radio         347.69      17.83  19.499 1.49e-13 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 571.6 on 18 degrees of freedom
Multiple R-squared:  0.9548,    Adjusted R-squared:  0.9523 
F-statistic: 380.2 on 1 and 18 DF,  p-value: 1.492e-13

Plot & Trend Line

#Plot Radio and Sales 
plot(radio,sales)
#Add a trend line plot using the linear model we created above
abline(reg,col="blue",lwd=2) 

List some observations from this plot.

This graph shows that as sales increase, so does radio.


Task2

Multiple Linear Regression

#Multiple Linear Regression Model
mlr1 <-lm(sales ~ radio + tv)
#Summary of Multiple Linear Regression Model
summary(mlr1)

Call:
lm(formula = sales ~ radio + tv)

Residuals:
     Min       1Q   Median       3Q      Max 
-1729.58  -205.97    56.95   335.15   759.26 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -17150.46    6965.59  -2.462 0.024791 *  
radio          275.69      68.73   4.011 0.000905 ***
tv              48.34      44.58   1.084 0.293351    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 568.9 on 17 degrees of freedom
Multiple R-squared:  0.9577,    Adjusted R-squared:  0.9527 
F-statistic: 192.6 on 2 and 17 DF,  p-value: 2.098e-12

For mlr1, the R-Squared values is 0.9577 and the Adj R-Squared is 0.9527.

Multiple Linear Regression Model

#mlr2 = Sales predicted by radio, tv, and pos
mlr2 <-lm(sales ~ radio + tv + pos)
summary(mlr2)

Call:
lm(formula = sales ~ radio + tv + pos)

Residuals:
     Min       1Q   Median       3Q      Max 
-1748.20  -187.42   -61.14   352.07   734.20 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept) -15491.23    7697.08  -2.013  0.06130 . 
radio          291.36      75.48   3.860  0.00139 **
tv              38.26      48.90   0.782  0.44538   
pos           -107.62     191.25  -0.563  0.58142   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 580.7 on 16 degrees of freedom
Multiple R-squared:  0.9585,    Adjusted R-squared:  0.9508 
F-statistic: 123.3 on 3 and 16 DF,  p-value: 2.859e-11
#mlr3 = Sales predicted by radio, tv, pos, and paper
mlr3 <-lm(sales ~ radio + tv + pos + paper)
summary(mlr3)

Call:
lm(formula = sales ~ radio + tv + pos + paper)

Residuals:
     Min       1Q   Median       3Q      Max 
-1558.13  -239.35     7.25   387.02   728.02 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept) -13801.015   7865.017  -1.755  0.09970 . 
radio          294.224     75.442   3.900  0.00142 **
tv              33.369     49.080   0.680  0.50693   
pos           -128.875    192.156  -0.671  0.51262   
paper           -9.159      8.991  -1.019  0.32449   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 580 on 15 degrees of freedom
Multiple R-squared:  0.9612,    Adjusted R-squared:  0.9509 
F-statistic: 92.96 on 4 and 15 DF,  p-value: 2.13e-10

Based purely on the values for R-Squared and Adj R-Squared, which linear regression model is best in predicting sales. Explain why.

The first linear regression model is best at predicting sales because it has less residual error. With less residual error, R-squared is more accurate.

Task 3

Watson Analytics

This shows the correlation between sales and paper. It shows that as sales increase, paper actually decreases.

LS0tCnRpdGxlOiAiQnVzaW5lc3MgQW5hbHl0aWNzIExhYiBXb3Jrc2hlZXQgMDUiCmF1dGhvcjogIkthdGllIEtvdXZlbGlzIgpkYXRlOiAiQXVndXN0IDIiCm91dHB1dDoKICBodG1sX25vdGVib29rOiBkZWZhdWx0CiAgaHRtbF9kb2N1bWVudDogZGVmYXVsdAogIHBkZl9kb2N1bWVudDogZGVmYXVsdApzdWJ0aXRsZTogQ01FIEdyb3VwIEZvdW5kYXRpb24gQnVzaW5lc3MgQW5hbHl0aWNzIExhYgotLS0KCgotLS0tLS0tLS0tCgojIyMgVGFzayAxCgojIyMjIE1hcmtldGluZyBEYXRhCmBgYHtyfQojUmVhZCBkYXRhIGNvcnJlY3RseQpteWRhdGEgPSByZWFkLmNzdihmaWxlPSJkYXRhL21hcmtldGluZy5jc3YiKQpoZWFkKG15ZGF0YSkKYGBgCgojIyMjQ29ycmVsYXRpb24gTWF0cml4CmBgYHtyfQojQ29ycmVsYXRpb24gTWF0cml4CmNvcnIgPSBjb3IobXlkYXRhIFtjKDIsNCw2KV0pCmNvcnI9Y29yKG15ZGF0YSBbMjo2XSkKY29ycgpgYGAKCmBgYHtyfQojaW5zdGFsbC5wYWNrYWdlcygiY29ycnBsb3QiKQojaW5zdGFsbC5wYWNrYWdlcygiY29ycmdyYW0iKQojbGlicmFyeShjb3JyZ3JhbSkKI2xpYnJhcnkoY29ycnBsb3QpCmNvcnJncmFtKGNvcnIpCmNvcnJwbG90KGNvcnIpCmBgYAojIyMjU2NhdHRlclBsb3QKCmBgYHtyfQojRXh0cmFjdCBhbGwgdmFyaWFibGVzCnBvcyAgPSBteWRhdGEkcG9zCnBhcGVyID0gbXlkYXRhJHBhcGVyCnR2ID0gbXlkYXRhJHR2CnNhbGVzID0gbXlkYXRhJHNhbGVzCnJhZGlvID0gbXlkYXRhJHJhZGlvCgojUGxvdCBvZiBSYWRpbyBhbmQgU2FsZXMgdXNpbmcgcGxvdCBjb21tYW5kIGZyb20gV29ya3NoZWV0IDQKcGxvdChyYWRpbyxzYWxlcykKCmBgYAoKIyMjI0xpbmVhciBSZWdyZXNzaW9uIAoKYGBge3J9CiNTaW1wbGUgTGluZWFyIFJlZ3Jlc3Npb24KcmVnIDwtIGxtKHNhbGVzIH4gcmFkaW8pCgojU3VtbWFyeSBvZiBNb2RlbApzdW1tYXJ5KHJlZykKYGBgCgojIyMjUGxvdCAmIFRyZW5kIExpbmUKCmBgYHtyfQojUGxvdCBSYWRpbyBhbmQgU2FsZXMgCnBsb3QocmFkaW8sc2FsZXMpCgojQWRkIGEgdHJlbmQgbGluZSBwbG90IHVzaW5nIHRoZSBsaW5lYXIgbW9kZWwgd2UgY3JlYXRlZCBhYm92ZQphYmxpbmUocmVnLGNvbD0iYmx1ZSIsbHdkPTIpIApgYGAKCkxpc3Qgc29tZSBvYnNlcnZhdGlvbnMgZnJvbSB0aGlzIHBsb3QuIAoKVGhpcyBncmFwaCBzaG93cyB0aGF0IGFzIHNhbGVzIGluY3JlYXNlLCBzbyBkb2VzIHJhZGlvLiAKCi0tLS0tLS0tLS0KCiMjIyBUYXNrMgoKIyMjI011bHRpcGxlIExpbmVhciBSZWdyZXNzaW9uCgpgYGB7cn0KI011bHRpcGxlIExpbmVhciBSZWdyZXNzaW9uIE1vZGVsCm1scjEgPC1sbShzYWxlcyB+IHJhZGlvICsgdHYpCgojU3VtbWFyeSBvZiBNdWx0aXBsZSBMaW5lYXIgUmVncmVzc2lvbiBNb2RlbApzdW1tYXJ5KG1scjEpCmBgYAoKRm9yIG1scjEsIHRoZSBSLVNxdWFyZWQgdmFsdWVzIGlzIDAuOTU3NyBhbmQgdGhlIEFkaiBSLVNxdWFyZWQgaXMgMC45NTI3LiAKCiMjIyNNdWx0aXBsZSBMaW5lYXIgUmVncmVzc2lvbiBNb2RlbCAKYGBge3J9CiNtbHIyID0gU2FsZXMgcHJlZGljdGVkIGJ5IHJhZGlvLCB0diwgYW5kIHBvcwptbHIyIDwtbG0oc2FsZXMgfiByYWRpbyArIHR2ICsgcG9zKQpzdW1tYXJ5KG1scjIpCiNtbHIzID0gU2FsZXMgcHJlZGljdGVkIGJ5IHJhZGlvLCB0diwgcG9zLCBhbmQgcGFwZXIKbWxyMyA8LWxtKHNhbGVzIH4gcmFkaW8gKyB0diArIHBvcyArIHBhcGVyKQpzdW1tYXJ5KG1scjMpCmBgYAoKQmFzZWQgcHVyZWx5IG9uIHRoZSB2YWx1ZXMgZm9yIFItU3F1YXJlZCBhbmQgQWRqIFItU3F1YXJlZCwgd2hpY2ggbGluZWFyIHJlZ3Jlc3Npb24gbW9kZWwgaXMgYmVzdCBpbiBwcmVkaWN0aW5nIHNhbGVzLiBFeHBsYWluIHdoeS4gCgpUaGUgZmlyc3QgbGluZWFyIHJlZ3Jlc3Npb24gbW9kZWwgaXMgYmVzdCBhdCBwcmVkaWN0aW5nIHNhbGVzIGJlY2F1c2UgaXQgaGFzIGxlc3MgcmVzaWR1YWwgZXJyb3IuIFdpdGggbGVzcyByZXNpZHVhbCBlcnJvciwgUi1zcXVhcmVkIGlzIG1vcmUgYWNjdXJhdGUuIAotLS0tLS0tLS0tCgojIyMgVGFzayAzCgojIyMjV2F0c29uIEFuYWx5dGljcwogCiAhW10oaW1ncy93YXRzb24ucG5nKQoKVGhpcyBzaG93cyB0aGUgY29ycmVsYXRpb24gYmV0d2VlbiBzYWxlcyBhbmQgcGFwZXIuIEl0IHNob3dzIHRoYXQgYXMgc2FsZXMgaW5jcmVhc2UsIHBhcGVyIGFjdHVhbGx5IGRlY3JlYXNlcy4gCgo=