This paper presents the statistical methods to analyse an advertising data set and how to extract the information to recommend or take actions to increase the sales of a product.

The Advertising data contains information for 200 different markets about the volume of sales (thousands of units) and the levels of advertising spent (thousands of dollars) in three different categories (TV, radio and newspapers).

A sample of the data set (first 10 and last 10 records is presented below.

Advertising <- read.csv("~/R/Demo works/Advertising.csv")

head (Advertising, 10)
##     X    TV Radio Newspaper Sales
## 1   1 230.1  37.8      69.2  22.1
## 2   2  44.5  39.3      45.1  10.4
## 3   3  17.2  45.9      69.3   9.3
## 4   4 151.5  41.3      58.5  18.5
## 5   5 180.8  10.8      58.4  12.9
## 6   6   8.7  48.9      75.0   7.2
## 7   7  57.5  32.8      23.5  11.8
## 8   8 120.2  19.6      11.6  13.2
## 9   9   8.6   2.1       1.0   4.8
## 10 10 199.8   2.6      21.2  10.6
tail(Advertising, 10)
##       X    TV Radio Newspaper Sales
## 191 191  39.5  41.1       5.8  10.8
## 192 192  75.5  10.8       6.0   9.9
## 193 193  17.2   4.1      31.6   5.9
## 194 194 166.8  42.0       3.6  19.6
## 195 195 149.7  35.6       6.0  17.3
## 196 196  38.2   3.7      13.8   7.6
## 197 197  94.2   4.9       8.1   9.7
## 198 198 177.0   9.3       6.4  12.8
## 199 199 283.6  42.0      66.2  25.5
## 200 200 232.1   8.6       8.7  13.4

The main objective here is to develop a statistical model to understand and explain the relationship between sales and the amount spent in advertising. If such a model can be constructed, is possible to predict how large the advertising budget needs tobe and the right mix of media in in order to reach the sales objectives.

## Warning: package 'tabplotd3' was built under R version 3.2.5
## Loading required package: tabplot
## Warning: package 'tabplot' was built under R version 3.2.5
## Loading required package: bit
## Attaching package bit
## package:bit (c) 2008-2012 Jens Oehlschlaegel (GPL-2)
## creators: bit bitwhich
## coercion: as.logical as.integer as.bit as.bitwhich which
## operator: ! & | xor != ==
## querying: print length any all min max range sum summary
## bit access: length<- [ [<- [[ [[<-
## for more help type ?bit
## 
## Attaching package: 'bit'
## The following object is masked from 'package:base':
## 
##     xor
## Loading required package: ff
## Warning: package 'ff' was built under R version 3.2.5
## Attaching package ff
## - getOption("fftempdir")=="C:/Users/Jen/AppData/Local/Temp/RtmpyCE2gY"
## - getOption("ffextension")=="ff"
## - getOption("ffdrop")==TRUE
## - getOption("fffinonexit")==TRUE
## - getOption("ffpagesize")==65536
## - getOption("ffcaching")=="mmnoflush"  -- consider "ffeachflush" if your system stalls on large writes
## - getOption("ffbatchbytes")==84641054.72 -- consider a different value for tuning your system
## - getOption("ffmaxbytes")==4232052736 -- consider a different value for tuning your system
## 
## Attaching package: 'ff'
## The following objects are masked from 'package:bit':
## 
##     clone, clone.default, clone.list
## The following objects are masked from 'package:utils':
## 
##     write.csv, write.csv2
## The following objects are masked from 'package:base':
## 
##     is.factor, is.ordered
## Loading required package: ffbase
## Warning: package 'ffbase' was built under R version 3.2.5
## 
## Attaching package: 'ffbase'
## The following objects are masked from 'package:ff':
## 
##     [.ff, [.ffdf, [<-.ff, [<-.ffdf
## The following objects are masked from 'package:base':
## 
##     %in%, table
## Standard deviations are plot by default. See argument numMode of plot.tabplot.

## Warning: package 'PerformanceAnalytics' was built under R version 3.2.5
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
## 
##     legend
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "method" is not a
## graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "method" is not a
## graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "method" is not a
## graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "method" is not a
## graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "method" is not a
## graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "method" is not a
## graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter

James, Witten, Hastie and Tibshirani suggest the following set of questions as a way to address the objectives of the study:

  1. Is there a relationship between advertising budget and sales? The first goal is to determine whether the data provide evidence of an association between advertising expenditure and sales.
  2. How strong is the relationship between advertising budget and sales? Assuming that there is a relationship between advertising and sales, What is the strength of this relationship. Given a certain advertising budget, can we predict sales with a high level of accuracy? (strong relationship). Or is a prediction of sales based on advertising expenditure only slightly better than a random guess? (weak relationship).
  3. Which media contribute to sales? Do all three media-TV, radio, and newspaper-contribute to sales, or do just one or two of the media contribute? To answer this question, a method needs to be found to separate out the individual e???ects of each medium when the client wants to spent money on all three media.
  4. How accurately the e???ect of each medium on sales can be estimated? What is the increae in sales expected as result of every dollar spent on advertising in a particular medium? How accurately can we predict this amount of increase?
  5. How accurately can future sales be predicted? For any given level of television, radio, or newspaper advertising, what is the prediction for sales, and what is the accuracy of this prediction?
  6. Is the relationship linear? If there is approximately a straight-line relationship between advertising expenditure in the various media and sales, then linear regression is an appropriate tool. If not, then it may still be possible to transform the predictor or the response so that linear regression can be used.
  7. Is there synergy among the advertising media? Does spending in a mix of media (x% on television advertising and y% on radio advertising) results in more sales than allocating 100% to either television or radio individually? “Synergy e???ect”.

(source: “An Introduction to Statistical Learning with Applications in R”, James, Witten, Hastie and Tibshirani. ISBN 978-1-4614-7138-7 (eBook) Springer Publishing )

Advertising data: Least squares coeffcient estimates of the multiple linear regression of number of units sold on radio, TV, and newspaper advertising budgets.

## 
## Call:
## lm(formula = Advertising$Sales ~ Advertising$TV + Advertising$Radio + 
##     Advertising$TV * Advertising$Radio, data = Advertising)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.3366 -0.4028  0.1831  0.5948  1.5246 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      6.750e+00  2.479e-01  27.233   <2e-16 ***
## Advertising$TV                   1.910e-02  1.504e-03  12.699   <2e-16 ***
## Advertising$Radio                2.886e-02  8.905e-03   3.241   0.0014 ** 
## Advertising$TV:Advertising$Radio 1.086e-03  5.242e-05  20.727   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9435 on 196 degrees of freedom
## Multiple R-squared:  0.9678, Adjusted R-squared:  0.9673 
## F-statistic:  1963 on 3 and 196 DF,  p-value: < 2.2e-16