{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE)

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

Introduction

It is tough to make good predictions. The numerous factors or variables, independent and dependent, involved in many sporting events contribute to the unpredictability. However, using carefully-selected variables, it is still possible to make marketing promotions more accountable.

The goal of this case study is to analyze if bobblehead promotions increase attendance at Dodgers home games. Using the fitted predictive model we can predict the attendance for the game in the forthcoming season and we can predict the attendance with or without bobblehead promotion.

The motivation of this case study is to design a predictive model, and report any interesting findings to support critical business decision making.

Pre-Processing

Important Tips: please make sure to reset your working directory before performing the analysis.

Load the required libraries and the data

```{r} #rm(list=ls())# clear memory

#setwd(“C:/Users/zxu3/Documents/R/regression”)

library(lattice) # Graphics Package library(ggplot2) # Graphical Package

#Create a dataframe with the Dodgers Data - if you import the data from your own drive #DodgersData <- read.csv(“DodgersData.csv”)

library(readr)

#adding a hashtag to the beginning of a line of syntax allows you to take notes or add descriptions.

#Now upload the following dataset to your work environment.

DodgersData <- read_csv(“DodgersData.csv”)

#Alternatively, you can read the data from my Github website.


## Data cleanup and exploratory analysis

Evaluate the Structure and Re-Level the factor variables for "Day Of Week"" and "Month"" in the right order

```{r error=TRUE}
# Check the structure for Dorder Data
str(DodgersData)

head(DodgersData)

# Evaluate the factor levels for day_of_week
# levels(DodgersData$day_of_week)

 

# Evaluate the factor levels for month
levels(DodgersData$month)

# First 10 rows of the data frame
head(DodgersData, 10)

Let R identify the temperature, the attendance, the opponent, and the promotion (i.e., bobblehead) for the 20th home game of the season.

```{r error=TRUE} DodgersData[20, c(“temp”, “attend”, “opponent”, “bobblehead”)]


## Let R identify the average value for attendance.

```{r error=TRUE}
meanattend <- mean(DodgersData$attend)
meanattend

Let R identify the number of promotions they have had.

```{r error=TRUE} promotions <- sum(DodgersData$bobblehead==“YES”) promotions

in-class notes


### Note: You may perform the regression analysis using Excel.

If you chose to use R and RStudio, please work on any two of the first three questions (1a, 1b, and 1c) and the last two questions (2 and 3).

If you chose to use Excel, please post your spreadsheet solutions and the answers to Questions 1a, 1b, 1c, 2, and 3.

#### Q 1a: Let R identify the temperature, the attendance, the opponent, and the promotion (i.e., bobblehead) for the 25th home game of the season. Report your results.

Answer:

#### Q 1b: What is the median value of attendance? Please review your in-class notes and write your function and answer below.

Answer:

#### Q 1c: How many night games did the Dodgers have? Please review your in-class notes and write your function and answer below.

Answer:

#### Q 2: Interpret one of the box plots or scatter plots in plain language.

Answer:

#### Q 3: Explain the final statistical results in plain language.


Answer:
#### Q 4: Please read the tutorial "Advanced topics - Formatting a testable marketing hypothesis.docx" and develop two "draft" hypotheses for your group project.

Answer:

## Exploratory analysis

The results show that in 2012 there were a few promotions (see the last four columns)

Cap
Shirt
Fireworks
Bobblehead

We have data from April to October for games played in the Day or Night under Clear or Cloudy Skys.

Dodger Stadium has a capacity of about 56,000. Looking at the entire (sample) data shows that the stadium filled up only twice in 2012. There were only two cap promotions, three shirt promotions - not enough data for any inferences. Fireworks and Bobblehead promotions have happened a few times.

Further more there were eleven bobble head promotions and most of then (six) being on Tuesday nights.


## Evaluate Attendance by Weather

```{r}
#Evaluate attendance by weather
ggplot(DodgersData, aes(x=temp, y=attend/1000, color=fireworks)) +
geom_point() +
facet_wrap(day_night~skies) +
ggtitle("Dodgers Attendance By Temperature By Time of Game and Skies") +
theme(plot.title = element_text(lineheight=3, face="bold", color="black", size=10)) +
xlab("Temperature (Degree Farenheit)") +
ylab("Attendance (Thousands)")

Strip Plot of Attendance by opponent or visiting team

```{r} #Strip Plot of Attendance by opponent or visiting team ggplot(DodgersData, aes(x=attend/1000, y=opponent, color=day_night)) + geom_point() + ggtitle(“Dodgers Attendance By Opponent”) + theme(plot.title = element_text(lineheight=3, face=“bold”, color=“black”, size=10)) + xlab(“Attendance (Thousands)”) + ylab(“Opponent (Visiting Team)”)


##Design Predictive Model

To advise the management if promotions impact attendance we will need to identify if there is a positive effect, and if there is a positive effect how much of an effect it is.

To provide this advice, I built a Linear Model for predicting attendance using Month, Day Of Week and the indicator variable Bobblehead promotion. I split the data into Training and Test to create the model

```{r}
# Create a model with the bobblehead variable entered last
my.model <- {attend ~ month + day_of_week + bobblehead}

# use the full data set to obtain an estimate of the increase in
# attendance due to bobbleheads, controlling for other factors
my.model.fit <- lm(my.model, data = DodgersData) # use all available data
print(summary(my.model.fit))

Reference