This is an example of Simple Linear Regression. We will see whether there is any relationship between advertising cost and sales of a product. And if there is any, we want to see how much advertising cost affects sales of this product.
We load the data first.
product
## AdvtCost Sales
## 1 128 489
## 2 158 550
## 3 170 500
## 4 200 670
## 5 250 670
## 6 72 350
## 7 90 360
## 8 180 410
## 9 82 110
## 10 170 275
## 11 178 300
## 12 200 520
summary(product)
## AdvtCost Sales
## Min. : 72 Min. :110
## 1st Qu.:118 1st Qu.:338
## Median :170 Median :450
## Mean :156 Mean :434
## 3rd Qu.:185 3rd Qu.:528
## Max. :250 Max. :670
We do Correlation plot and test as well.
library("ggplot2")
qplot(AdvtCost, Sales, data = product, geom = c("point", "smooth"), method = "lm")
cor.test(product$Sales, product$AdvtCost)
##
## Pearson's product-moment correlation
##
## data: product$Sales and product$AdvtCost
## t = 2.899, df = 10, p-value = 0.01587
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1663 0.9004
## sample estimates:
## cor
## 0.6757
We do see there is a positive correlation between Advertising Cost and Sales which is 68%
We build simple regression model.
product_reg <- lm(Sales ~ AdvtCost, data = product)
summary(product_reg)
##
## Call:
## lm(formula = Sales ~ AdvtCost, data = product)
##
## Residuals:
## Min 1Q Median 3Q Max
## -186.7 -96.6 40.2 97.2 146.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 108.523 118.092 0.92 0.380
## AdvtCost 2.078 0.717 2.90 0.016 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 128 on 10 degrees of freedom
## Multiple R-squared: 0.457, Adjusted R-squared: 0.402
## F-statistic: 8.4 on 1 and 10 DF, p-value: 0.0159
The model gives R-squared - 0.4566, Adjusted R-squared - 0.4022, p-value - 0.01587 and RMSE - 128. R-squared of 46% also means 54% of sales is not explained by Advertising cost. Overall, the model is significant.
We also see all the residual plots.
par(mfrow = c(2,2))
plot(product_reg)
We add the predicted valued to the original table.
prediction <- round(predict(product_reg), 2)
product$prediction <- prediction
product
## AdvtCost Sales prediction
## 1 128 489 374.5
## 2 158 550 436.8
## 3 170 500 461.7
## 4 200 670 524.0
## 5 250 670 627.9
## 6 72 350 258.1
## 7 90 360 295.5
## 8 180 410 482.5
## 9 82 110 278.9
## 10 170 275 461.7
## 11 178 300 478.3
## 12 200 520 524.0