We’ll use the marketing data set [datarium package], which contains the impact of the amount of money spent on three advertising medias (youtube, facebook and newspaper) on sales.
#install.packages("datarium")
data("marketing", package = "datarium")
head(marketing, 4)
## youtube facebook newspaper sales
## 1 276.12 45.36 83.04 26.52
## 2 53.40 47.16 54.12 12.48
## 3 20.64 55.08 83.16 11.16
## 4 181.80 49.56 70.20 22.20
#write.csv(marketing,file = "C:\\Users\\mindy\\Dropbox\\SHI\\Course Materials\\Mindy\\Basic Biostatistics\\Lecture 10\\marketing.csv")
We want to build a model for estimating sales based on the advertising budget invested in youtube as follow:
\(sales = b0 + b1*youtube\)
You can compute the model coefficients in R as follow:
model <- lm(sales ~ youtube, data = marketing)
summary(model)
##
## Call:
## lm(formula = sales ~ youtube, data = marketing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.0632 -2.3454 -0.2295 2.4805 8.6548
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.439112 0.549412 15.36 <2e-16 ***
## youtube 0.047537 0.002691 17.67 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.91 on 198 degrees of freedom
## Multiple R-squared: 0.6119, Adjusted R-squared: 0.6099
## F-statistic: 312.1 on 1 and 198 DF, p-value: < 2.2e-16
To see which predictor variables are significant, you can examine the coefficients table, which shows the estimate of regression beta coefficients and the associated t-statitic p-values:
summary(model)$coefficient
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.43911226 0.549411528 15.36028 1.40630e-35
## youtube 0.04753664 0.002690607 17.66763 1.46739e-42
For a given the predictor, the t-statistic evaluates whether or not there is significant association between the predictor and the outcome variable, that is whether the beta coefficient of the predictor is significantly different from zero.
It can be seen that, changing in youtube advertising budget are significantly associated to changes in sales.
For a given predictor variable, the coefficient (b) can be interpreted as the average effect on y of a one unit increase in predictor, holding all other predictors fixed.
The youtube coefficient suggests that for every 1000 dollars increase in youtube advertising budget, we can expect an increase of 0.048*1000 = 48 sales units, on average.
The confidence interval of the model coefficient can be extracted as follow:
confint(model)
## 2.5 % 97.5 %
## (Intercept) 7.35566312 9.52256140
## youtube 0.04223072 0.05284256
Let us predict the sales volume when the youtube advertisement spent is 200, using the above simple linear regression model.
newData = data.frame(youtube=200)
predict(model, newdata=newData, interval="confidence", level=0.95)
## fit lwr upr
## 1 17.94644 17.38703 18.50585