ad=read.csv('Advertising.csv')
attach(ad)
par(mfrow = c(1,3))
plot(TV, Sales, cex.lab=2, cex.axis=1.2)
plot(Radio,Sales,cex.lab=2,cex.axis=1.2)
title("Advertising data",cex.main = 2,font.main= 4, col.main= "blue")
plot(Newspaper,Sales,cex.lab=2,cex.axis=1.2)

Question we would like to answer:

lm.radio=lm(Sales ~ Radio)
lm.tv = lm(Sales ~ TV)
lm.newspaper = lm(Sales ~ Newspaper)
par(mfrow = c(1,3))
plot(TV, Sales, cex.lab = 2, cex.axis = 1.2)
abline(lm.tv, col = "blue", lty = 1, lwd = 2)
plot(Radio,Sales,cex.lab=2,cex.axis=1.2)
abline(lm.radio, col="blue", lty=1, lwd=2)
plot(Newspaper,Sales,cex.lab=2,cex.axis=1.2)
abline(lm.newspaper, col="blue", lty=1, lwd=2)

Analyzing the regression model of Sales Vs TV

summary(lm.tv)
## 
## Call:
## lm(formula = Sales ~ TV)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.3860 -1.9545 -0.1913  2.0671  7.2124 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.032594   0.457843   15.36   <2e-16 ***
## TV          0.047537   0.002691   17.67   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.259 on 198 degrees of freedom
## Multiple R-squared:  0.6119, Adjusted R-squared:  0.6099 
## F-statistic: 312.1 on 1 and 198 DF,  p-value: < 2.2e-16

\(\hat{\beta}1 = 0.047537\)” that advise 1000 dollar increase in TV advertising sale is associated with an increase in sale by 47 units. Notice that “\(\hat{\beta}0\)” and “\(\hat{\beta}1\)” are very large comparative to their standard erros and so the t static is also very large. Checking p value(<2e-16), we can ignore the null hypothesis.

Once ignore the null hypothesis, the next item is to find the extent, model fits the data. So, now checking for -

It is an estimate of the standard deviation of error term, \({\epsilon}\) So even if the model were correct, any prediction on sales would still be off by 3,260 units.

mean(Sales)
## [1] 14.0225

Since mean Sales in data set is 14,022 units. Therefore the percentage error is 23%.

It tells 2/3 of varibility in sales is explained by linear regression on R.