LinearRegression

Question we would like to answer:

Is there a relationship between advertising budget and sales?
How strong is the relationship between advertising budget and sales?
Which media contribute to sales?
How accurately can we estimate the effect of each medium on sales?
How accurately can we predict future sales?
Is the relationship linear?
Is there synergy (interaction) among the advertising media?

lm.radio=lm(Sales ~ Radio)
lm.tv = lm(Sales ~ TV)
lm.newspaper = lm(Sales ~ Newspaper)
par(mfrow = c(1,3))
plot(TV, Sales, cex.lab = 2, cex.axis = 1.2)
abline(lm.tv, col = "blue", lty = 1, lwd = 2)
plot(Radio,Sales,cex.lab=2,cex.axis=1.2)
abline(lm.radio, col="blue", lty=1, lwd=2)
plot(Newspaper,Sales,cex.lab=2,cex.axis=1.2)
abline(lm.newspaper, col="blue", lty=1, lwd=2)

Analyzing the regression model of Sales Vs TV

summary(lm.tv)

## 
## Call:
## lm(formula = Sales ~ TV)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.3860 -1.9545 -0.1913  2.0671  7.2124 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.032594   0.457843   15.36   <2e-16 ***
## TV          0.047537   0.002691   17.67   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.259 on 198 degrees of freedom
## Multiple R-squared:  0.6119, Adjusted R-squared:  0.6099 
## F-statistic: 312.1 on 1 and 198 DF,  p-value: < 2.2e-16

“\(\hat{\beta}1 = 0.047537\)” that advise 1000 dollar increase in TV advertising sale is associated with an increase in sale by 47 units. Notice that “\(\hat{\beta}0\)” and “\(\hat{\beta}1\)” are very large comparative to their standard erros and so the t static is also very large. Checking p value(<2e-16), we can ignore the null hypothesis.

Once ignore the null hypothesis, the next item is to find the extent, model fits the data. So, now checking for -

Residual standard error: 3.259 on 198 degrees of freedom

It is an estimate of the standard deviation of error term, \({\epsilon}\) So even if the model were correct, any prediction on sales would still be off by 3,260 units.

mean(Sales)

## [1] 14.0225

Since mean Sales in data set is 14,022 units. Therefore the percentage error is 23%.

Multiple R-squared: 0.6119, Adjusted R-squared: 0.6099

It tells 2/3 of varibility in sales is explained by linear regression on R.

F-statistic: 312.1 on 1 and 198 DF, p-value: < 2.2e-16

LinearRegression

Amit Kumar

September 4, 2016

Question we would like to answer: