This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
H0- Null hypotheses use as basis for argument but has not yet proven, no difference prediction (all equal).
H1 - Alternative hypotheses statement set-up to establish like new effect compared to existing (e.g new drug is better than the existing standard products).
Planning
Dependent variable = outcome
Independent variable(s) = explanatory variables(Predictors)
Design
Analysis
Planning & Design (2 months)
Conduct Experiment (3 months)
Analysis (1 month)
Randomization
Replication
Blocking
Assume that you are working for a consulting firm and are responsible for analyzing sales as a function of TV, radio, or newspaper. Please find the original example and code available in “An introduction to Statistical Learning: With Applications in R (https://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf).
#read in data
library(readr)
## Warning: package 'readr' was built under R version 3.6.3
ad_sales <- read_csv('advertising.csv')
## Warning: Missing column names filled in: 'X1' [1]
## Warning: Duplicated column names deduplicated: 'X1' => 'X1_1' [2]
## Parsed with column specification:
## cols(
## X1 = col_double(),
## X1_1 = col_double(),
## TV = col_double(),
## radio = col_double(),
## newspaper = col_double(),
## sales = col_double()
## )
attach(ad_sales)
par(mfrow = c(1,3))
plot(TV, sales, cex.lab=2, cex.axis=1.2)
plot(radio,sales,cex.lab=2,cex.axis=1.2)
title("Advertising & sales",cex.main = 2,font.main= 4, col.main= "blue")
plot(newspaper,sales,cex.lab=2,cex.axis=1.2)
The plot displays “sales, in thousands of units, as a function of TV, radio, and newspaper budgets, in thousands of dollars, for 200 different markets (James et al., 2017).”
In the next section, we use the least square model to make a prediction.
lm.radio=lm(sales ~ radio)
lm.tv = lm(sales ~ TV)
lm.newspaper = lm(sales ~ newspaper)
par(mfrow = c(1,3))
plot(TV, sales, cex.lab = 2, cex.axis = 1.2)
abline(lm.tv, col = "blue", lty = 1, lwd = 2)
plot(radio,sales,cex.lab=2,cex.axis=1.2)
abline(lm.radio, col="blue", lty=1, lwd=2)
plot(newspaper,sales,cex.lab=2,cex.axis=1.2)
abline(lm.newspaper, col="blue", lty=1, lwd=2)
Each blue line represents a simple model that can be used to predict sales using TV, radio, and newspaper, respectively.
summary(lm.tv)
##
## Call:
## lm(formula = sales ~ TV)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.3860 -1.9545 -0.1913 2.0671 7.2124
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.032594 0.457843 15.36 <2e-16 ***
## TV 0.047537 0.002691 17.67 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.259 on 198 degrees of freedom
## Multiple R-squared: 0.6119, Adjusted R-squared: 0.6099
## F-statistic: 312.1 on 1 and 198 DF, p-value: < 2.2e-16
“β^1=0.0475” that advise 1000 dollar increase in TV advertising sale is associated with an increase in sale by 47 units. Notice that “β^0” and “β^1” are very large comparative to their standard erros and so the t static is also very large. Checking p value(<2e-16), we can ignore the null hypothesis.
Once ignore the null hypothesis, the next item is to find the extent, model fits the data. So, now checking for -
Residual standard error: 3.259 on 198 degrees of freedom It is an estimate of the standard deviation of error term, ϵ So even if the model were correct, any prediction on sales would still be off by 3,260 units.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112, p. 18). New York: springer.
Dr. Gareth James is currently serving as the Interim Dean of the Marshall School of Business. He is an expert on statistical methodology in the areas of functional data analysis and high dimensional statistics, with particular application to marketing problems.