knitr::opts_chunk$set(echo = TRUE)

Introduction

This case study looks at whether bobblehead and firework promotions so that I can increase attendance at Los Angeles Dodger basball home games. Using a linear regression model we look at the factors and month,day of week and promitons they significantly impact the attendance. THe goal is to provide data driven type of recommendations to be able to support Dodgers marketing decisions.

Pre-Processing

Load the required libraries and the dataset.

library(lattice)
library(ggplot2)
library(readr)

DodgersData <- read_csv("https://raw.githubusercontent.com/utjimmyx/regression/master/DodgersData.csv")
## `curl` package not installed, falling back to using `url()`
## Rows: 81 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (9): month, day_of_week, opponent, skies, day_night, cap, shirt, firewor...
## dbl (3): day, attend, temp
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(DodgersData)
## spc_tbl_ [81 × 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ month      : chr [1:81] "APR" "APR" "APR" "APR" ...
##  $ day        : num [1:81] 10 11 12 13 14 15 23 24 25 27 ...
##  $ attend     : num [1:81] 56000 29729 28328 31601 46549 ...
##  $ day_of_week: chr [1:81] "Tuesday" "Wednesday" "Thursday" "Friday" ...
##  $ opponent   : chr [1:81] "Pirates" "Pirates" "Pirates" "Padres" ...
##  $ temp       : num [1:81] 67 58 57 54 57 65 60 63 64 66 ...
##  $ skies      : chr [1:81] "Clear" "Cloudy" "Cloudy" "Cloudy" ...
##  $ day_night  : chr [1:81] "Day" "Night" "Night" "Night" ...
##  $ cap        : chr [1:81] "NO" "NO" "NO" "NO" ...
##  $ shirt      : chr [1:81] "NO" "NO" "NO" "NO" ...
##  $ fireworks  : chr [1:81] "NO" "NO" "NO" "YES" ...
##  $ bobblehead : chr [1:81] "NO" "NO" "NO" "NO" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   month = col_character(),
##   ..   day = col_double(),
##   ..   attend = col_double(),
##   ..   day_of_week = col_character(),
##   ..   opponent = col_character(),
##   ..   temp = col_double(),
##   ..   skies = col_character(),
##   ..   day_night = col_character(),
##   ..   cap = col_character(),
##   ..   shirt = col_character(),
##   ..   fireworks = col_character(),
##   ..   bobblehead = col_character()
##   .. )
##  - attr(*, "problems")=<pointer: 0x5eaf8a43b1a0>
head(DodgersData, 10)
## # A tibble: 10 × 12
##    month   day attend day_of_week opponent   temp skies  day_night cap   shirt
##    <chr> <dbl>  <dbl> <chr>       <chr>     <dbl> <chr>  <chr>     <chr> <chr>
##  1 APR      10  56000 Tuesday     Pirates      67 Clear  Day       NO    NO   
##  2 APR      11  29729 Wednesday   Pirates      58 Cloudy Night     NO    NO   
##  3 APR      12  28328 Thursday    Pirates      57 Cloudy Night     NO    NO   
##  4 APR      13  31601 Friday      Padres       54 Cloudy Night     NO    NO   
##  5 APR      14  46549 Saturday    Padres       57 Cloudy Night     NO    NO   
##  6 APR      15  38359 Sunday      Padres       65 Clear  Day       NO    NO   
##  7 APR      23  26376 Monday      Braves       60 Cloudy Night     NO    NO   
##  8 APR      24  44014 Tuesday     Braves       63 Cloudy Night     NO    NO   
##  9 APR      25  26345 Wednesday   Braves       64 Cloudy Night     NO    NO   
## 10 APR      27  44807 Friday      Nationals    66 Clear  Night     NO    NO   
## # ℹ 2 more variables: fireworks <chr>, bobblehead <chr>
meanattend <- mean(DodgersData$attend)
meanattend
## [1] 41040.07
promotions <- sum(DodgersData$bobblehead == "YES")
promotions
## [1] 11

Data Cleanup and Exploratory Analysis

The results show that in 2012 there were a few promotions (Cap, Shirt, Fireworks, Bobblehead). We have data from April to October for games played in the Day or Night under Clear or Cloudy Skies.

Evaluate Attendance by Weather

ggplot(DodgersData, aes(x=temp, y=attend/1000, color=fireworks)) +
  geom_point() +
  facet_wrap(day_night~skies) +
  ggtitle("Dodgers Attendance By Temperature By Time of Game and Skies") +
  theme(plot.title = element_text(lineheight=3, face="bold", color="black", size=10)) +
  xlab("Temperature (Degree Farenheit)") +
  ylab("Attendance (Thousands)")

Strip Plot of Attendance by Opponent or Visiting Team

ggplot(DodgersData, aes(x=attend/1000, y=opponent, color=day_night)) +
  geom_point() +
  ggtitle("Dodgers Attendance By Opponent") +
  theme(plot.title = element_text(lineheight=3, face="bold", color="black", size=10)) +
  xlab("Attendance (Thousands)") +
  ylab("Opponent (Visiting Team)")

Design Predictive Model

This is for advising management if promotions impact attendance we need to identify if there is a positive effect, and if there is, how much of an effect it is.

Re-level factors for correct order

DodgersData\(month <- factor(DodgersData\)month, levels = c(“APR”,“MAY”,“JUN”,“JUL”,“AUG”,“SEP”,“OCT”)) DodgersData\(day_of_week <- factor(DodgersData\)day_of_week, levels = c(“Monday”,“Tuesday”,“Wednesday”,“Thursday”,“Friday”,“Saturday”,“Sunday”))

Create a model with bobblehead entered last

my.model <- attend ~ month + day_of_week + bobblehead

Fit the model using all available data

my.model.fit <- lm(my.model, data = DodgersData) print(summary(my.model.fit))

my.model3 <- attend ~ month + day_of_week + bobblehead + fireworks
my.model.fit3 <- lm(my.model3, data = DodgersData)
print(summary(my.model.fit3))
## 
## Call:
## lm(formula = my.model3, data = DodgersData)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -9504  -3683   -709   2569  15390 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          21935.44    6710.73   3.269  0.00172 ** 
## monthAUG              2377.88    2300.15   1.034  0.30501    
## monthJUL              1315.38    2534.42   0.519  0.60549    
## monthJUN              7062.62    2616.13   2.700  0.00881 ** 
## monthMAY             -2492.79    2193.60  -1.136  0.25990    
## monthOCT              -502.88    3873.86  -0.130  0.89711    
## monthSEP               -55.37    2413.63  -0.023  0.98177    
## day_of_weekMonday    12386.23    6901.86   1.795  0.07729 .  
## day_of_weekSaturday  18480.41    6783.62   2.724  0.00824 ** 
## day_of_weekSunday    18964.20    6824.84   2.779  0.00710 ** 
## day_of_weekThursday  12695.47    7080.43   1.793  0.07755 .  
## day_of_weekTuesday   20136.70    6908.16   2.915  0.00485 ** 
## day_of_weekWednesday 13290.40    6351.42   2.093  0.04024 *  
## bobbleheadYES        10995.05    2318.43   4.742 1.17e-05 ***
## fireworksYES         17028.78    6381.63   2.668  0.00958 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5858 on 66 degrees of freedom
## Multiple R-squared:  0.5887, Adjusted R-squared:  0.5015 
## F-statistic: 6.749 on 14 and 66 DF,  p-value: 2.848e-08

Do Fireworks Matter as a Second Promotion Variable?

I do believe so because based on the regression results, fireworks has a p-value of 0.00958 which is below the 0.05 threshold, this means that it is statistically significant. Fireworks as a promotional variable is an good marketing strategy for the Dodgers and could potentially increase overall attendance.

Business Recommendation

Based on our exploratory analysis and regression modeling, both bobblehead and fireworks promotions have a statistically significant positive impact on Dodgers game attendance. The Dodgers should prioritize scheduling bobblehead and fireworks promotions strategically throughout the season, particularly on weekends, to maximize attendance. If management implements these promotional strategies consistently, overall attendance and revenue are expected to increase.