Advertisement Analysis

Sameer Mathur

Read the Advertising Data

# Read the data
advertising.df <- read.csv(paste("AdvertisingData.csv", sep=""))
library(car)
some(advertising.df)
       TV Radio Newspaper Sales
6     8.7  48.9      75.0   7.2
10  199.8   2.6      21.2  10.6
21  218.4  27.7      53.4  18.0
79    5.4  29.9       9.4   5.3
104 187.9  17.2      17.9  14.7
106 137.9  46.4      59.0  19.2
113 175.7  15.4       2.4  14.1
140 184.9  43.9       1.7  20.7
188 191.1  28.7      18.2  17.3
190  18.7  12.1      23.4   6.7

Summarize the Advertising Data

# summarize the data
attach(advertising.df)
library(psych)
describe(advertising.df)[,1:9]
          vars   n   mean    sd median trimmed    mad min   max
TV           1 200 147.04 85.85 149.75  147.20 108.82 0.7 296.4
Radio        2 200  23.26 14.85  22.90   23.00  19.79 0.0  49.6
Newspaper    3 200  30.55 21.78  25.75   28.41  23.13 0.3 114.0
Sales        4 200  14.02  5.22  12.90   13.78   4.82 1.6  27.0

Check Datatypes

# checking  data types of the data fields
str(advertising.df)
'data.frame':   200 obs. of  4 variables:
 $ TV       : num  230.1 44.5 17.2 151.5 180.8 ...
 $ Radio    : num  37.8 39.3 45.9 41.3 10.8 48.9 32.8 19.6 2.1 2.6 ...
 $ Newspaper: num  69.2 45.1 69.3 58.5 58.4 75 23.5 11.6 1 21.2 ...
 $ Sales    : num  22.1 10.4 9.3 18.5 12.9 7.2 11.8 13.2 4.8 10.6 ...

Mean and Standard deviation of spending on different promotion strategies

# mean and standard deviation of spending on different promotion strategies
sapply(advertising.df[c("Sales", "TV", "Radio", "Newspaper")], function(x)(c(mean=mean(x),sd=sd(x))))
         Sales        TV    Radio Newspaper
mean 14.022500 147.04250 23.26400  30.55400
sd    5.217457  85.85424 14.84681  21.77862

Ques 1: Is there a relationship between advertising sales and budget?

This question can be answered by fitting a multiple regression model of sales onto TV, Radio, and Newspaper as follows:

\( sales = \beta_0 + \beta_1 \times TV + \beta_2 \times Radio + \beta_3 \times Newspaper + \epsilon \) and testing the hypothesis \( H_0 : \beta_1 TV = \beta_2 Radio = \beta_3 Newspaper = 0 \)

Model1 <- Sales ~ TV + Radio + Newspaper
fit1 <- lm(Model1, data = advertising.df)
summary(fit1)

Model 1: Summary


Call:
lm(formula = Model1, data = advertising.df)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.8277 -0.8908  0.2418  1.1893  2.8292 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.938889   0.311908   9.422   <2e-16 ***
TV           0.045765   0.001395  32.809   <2e-16 ***
Radio        0.188530   0.008611  21.893   <2e-16 ***
Newspaper   -0.001037   0.005871  -0.177     0.86    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.686 on 196 degrees of freedom
Multiple R-squared:  0.8972,    Adjusted R-squared:  0.8956 
F-statistic: 570.3 on 3 and 196 DF,  p-value: < 2.2e-16

Model 1: Summary

The F-statistic can be used to determine whether or not we should reject this null hypothesis. In this case the p-value corresponding to the F-statistic given in the following table is very low, indicating clear evidence of a relationship between advertising and sales.

Ques 2: How strong is the relationship?

# regress `Sales` on `TV`
ModelTV <- Sales ~ TV
fitTV <- lm(ModelTV, data = advertising.df)
summary(fitTV)

Call:
lm(formula = ModelTV, data = advertising.df)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.3860 -1.9545 -0.1913  2.0671  7.2124 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 7.032594   0.457843   15.36   <2e-16 ***
TV          0.047537   0.002691   17.67   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.259 on 198 degrees of freedom
Multiple R-squared:  0.6119,    Adjusted R-squared:  0.6099 
F-statistic: 312.1 on 1 and 198 DF,  p-value: < 2.2e-16

Ques 2: How strong is the relationship?

Once we have rejected the null hypothesis \( H_0 : There is no relationship between X and Y \) in favor of the alternative hypothesis \( H_a : There is some relationship between X and Y \),

It is natural to want to quantify the extent to which the model fits the data. The quality of a linear regression fit is typically assessed using two related quantities: the residual standard error (RSE) and the \( R^2 \) statistic.

Ques 2: How strong is the relationship?

# R-squared
summary(fitTV)$r.squared 
[1] 0.6118751
# F-statistic
summary(fitTV)$fstatistic
  value   numdf   dendf 
312.145   1.000 198.000 
  • Residual Standard Error (RSE) = 3.259
  • Adjusted R-squared = 0.6119
  • F-statistic = 312.1 on 1 and 198 DF

For the Advertising data, more information about the least squares model for the regression of number of units sold on TV advertising budget.

Using the Model 1, more information about the least squares model for the regression of number of units sold on TV, newspaper, and radio advertising budgets in the Advertising data.

  • Residual Standard Error (RSE) = 1.686
  • Adjusted R-squared = 0.8972
  • F-statistic = 570.3 on 3 and 196 DF

First, the RSE estimates the standard deviation of the response from the population regression line.

For the Advertising data, the RSE is 1,681 units while the mean value for the response is 14,022, indicating a percentage error of roughly 12%.Second, the \( R^2 \) statistic records the percentage of variability in the response that is explained by the predictors.

The predictors explain almost 90% of the variance in sales. The RSE and \( R^2 \) statistics are displayed above.

Ques 3: Which media contribute to sales?

To answer this question, we can examine the p-values associated with each predictor's t-statistic. In the multiple linear regression displayed in Regression Model 1 also below.

# p-values of  Model 1
summary(fit1)$coefficients[,4]   
 (Intercept)           TV        Radio    Newspaper 
1.267295e-17 1.509960e-81 1.505339e-54 8.599151e-01 

The p-values for TV and Radio are low, but the p-value for newspaper is not. This suggests that only TV and Radio are related to sales.

Ques 4: How large is the effect of each medium on sales?

The standard error of \( \hat{\beta_j} \) can be used to construct confidence intervals for \( \beta_j \). For the Advertising data, the 95% confidence intervals are as follows:

Confidence Interval

# confidence interval
confint(fit1)
                  2.5 %     97.5 %
(Intercept)  2.32376228 3.55401646
TV           0.04301371 0.04851558
Radio        0.17154745 0.20551259
Newspaper   -0.01261595 0.01054097

Visualize the confidence intervals

library(coefplot)
coefplot(fit1, intercept=FALSE)

plot of chunk unnamed-chunk-9

The confidence intervals for TV and Radio are narrow and far from zero, providing evidence that these media are related to Sales.

But the interval for Newspaper includes zero, indicating that the variable is not statistically significant given the values of TV and Radio.