Research Design

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

Research design - Basics

H0- Null hypotheses use as basis for argument but has not yet proven, no difference prediction (all equal).

H1 - Alternative hypotheses statement set-up to establish like new effect compared to existing (e.g new drug is better than the existing standard products).

Steps of an Experiment

Planning

Dependent variable = outcome

Independent variable(s) = explanatory variables(Predictors)

Design

Analysis

Example of timeline

Planning & Design (2 months)

Conduct Experiment (3 months)

Analysis (1 month)

Key Elements of an Experiment

Randomization

Replication

Blocking

Research design - an example

Assume that you are working for a consulting firm and are responsible for analyzing sales as a function of TV, radio, or newspaper. Please find the original example and code available in “An introduction to Statistical Learning: With Applications in R (https://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf).

Discussion Questions for you (answer any 4 questions)

Is there a relationship between advertising budget and sales?
How strong is the relationship between advertising budget and sales?
Which media contribute to sales?
How accurately can we estimate the effect of each medium on sales?
Is that possible for us to predict future sales?
Is the relationship linear?
Any other questions that can be answered

#read in data
library(readr)

## Warning: package 'readr' was built under R version 3.6.3

ad_sales <- read_csv('advertising.csv')

## Warning: Missing column names filled in: 'X1' [1]

## Warning: Duplicated column names deduplicated: 'X1' => 'X1_1' [2]

## Parsed with column specification:
## cols(
##   X1 = col_double(),
##   X1_1 = col_double(),
##   TV = col_double(),
##   radio = col_double(),
##   newspaper = col_double(),
##   sales = col_double()
## )

attach(ad_sales)
par(mfrow = c(1,3))
plot(TV, sales, cex.lab=2, cex.axis=1.2)
plot(radio,sales,cex.lab=2,cex.axis=1.2)
title("Advertising & sales",cex.main = 2,font.main= 4, col.main= "blue")
plot(newspaper,sales,cex.lab=2,cex.axis=1.2)

The plot displays “sales, in thousands of units, as a function of TV, radio, and newspaper budgets, in thousands of dollars, for 200 different markets (James et al., 2017).”

In the next section, we use the least square model to make a prediction.

lm.radio=lm(sales ~ radio)
lm.tv = lm(sales ~ TV)
lm.newspaper = lm(sales ~ newspaper)
par(mfrow = c(1,3))
plot(TV, sales, cex.lab = 2, cex.axis = 1.2)
abline(lm.tv, col = "blue", lty = 1, lwd = 2)
plot(radio,sales,cex.lab=2,cex.axis=1.2)
abline(lm.radio, col="blue", lty=1, lwd=2)
plot(newspaper,sales,cex.lab=2,cex.axis=1.2)
abline(lm.newspaper, col="blue", lty=1, lwd=2)

Each blue line represents a simple model that can be used to predict sales using TV, radio, and newspaper, respectively.

Now we can also analyze the effect of TV advertising on sales

summary(lm.tv)

## 
## Call:
## lm(formula = sales ~ TV)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.3860 -1.9545 -0.1913  2.0671  7.2124 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.032594   0.457843   15.36   <2e-16 ***
## TV          0.047537   0.002691   17.67   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.259 on 198 degrees of freedom
## Multiple R-squared:  0.6119, Adjusted R-squared:  0.6099 
## F-statistic: 312.1 on 1 and 198 DF,  p-value: < 2.2e-16

“β^1=0.0475” that advise 1000 dollar increase in TV advertising sale is associated with an increase in sale by 47 units. Notice that “β^0” and “β^1” are very large comparative to their standard erros and so the t static is also very large. Checking p value(<2e-16), we can ignore the null hypothesis.

Once ignore the null hypothesis, the next item is to find the extent, model fits the data. So, now checking for -

Residual standard error: 3.259 on 198 degrees of freedom It is an estimate of the standard deviation of error term, ϵ So even if the model were correct, any prediction on sales would still be off by 3,260 units.

Reference

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112, p. 18). New York: springer.

Dr. Gareth James is currently serving as the Interim Dean of the Marshall School of Business. He is an expert on statistical methodology in the areas of functional data analysis and high dimensional statistics, with particular application to marketing problems.