Introduction

It is tough to make good predictions. The numerous factors or variables, independent and dependent, involved in many sporting events contribute to the unpredictability. However, using carefully-selected variables, it is still possible to make marketing promotions more accountable.

The goal of this case study is to analyze if bobblehead promotions increase attendance at Dodgers home games. Using the fitted predictive model, we can predict the attendance for the game in the forthcoming season and we can predict the attendance with or without bobblehead promotion.

The motivation of this case study is to design a predictive model, and report any interesting findings to support critical business decision making.

Pre-Processing

Load the required libraries and the data.

library(lattice)
library(ggplot2)
library(readr)

DodgersData <- read_csv(
  "https://raw.githubusercontent.com/utjimmyx/regression/master/DodgersData.csv"
)
## `curl` package not installed, falling back to using `url()`
## Rows: 81 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (9): month, day_of_week, opponent, skies, day_night, cap, shirt, firewor...
## dbl (3): day, attend, temp
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data cleanup and exploratory analysis

Evaluate the Structure and Re-Level the factor variables for Day Of Week and Month.

str(DodgersData)
## spc_tbl_ [81 × 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ month      : chr [1:81] "APR" "APR" "APR" "APR" ...
##  $ day        : num [1:81] 10 11 12 13 14 15 23 24 25 27 ...
##  $ attend     : num [1:81] 56000 29729 28328 31601 46549 ...
##  $ day_of_week: chr [1:81] "Tuesday" "Wednesday" "Thursday" "Friday" ...
##  $ opponent   : chr [1:81] "Pirates" "Pirates" "Pirates" "Padres" ...
##  $ temp       : num [1:81] 67 58 57 54 57 65 60 63 64 66 ...
##  $ skies      : chr [1:81] "Clear" "Cloudy" "Cloudy" "Cloudy" ...
##  $ day_night  : chr [1:81] "Day" "Night" "Night" "Night" ...
##  $ cap        : chr [1:81] "NO" "NO" "NO" "NO" ...
##  $ shirt      : chr [1:81] "NO" "NO" "NO" "NO" ...
##  $ fireworks  : chr [1:81] "NO" "NO" "NO" "YES" ...
##  $ bobblehead : chr [1:81] "NO" "NO" "NO" "NO" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   month = col_character(),
##   ..   day = col_double(),
##   ..   attend = col_double(),
##   ..   day_of_week = col_character(),
##   ..   opponent = col_character(),
##   ..   temp = col_double(),
##   ..   skies = col_character(),
##   ..   day_night = col_character(),
##   ..   cap = col_character(),
##   ..   shirt = col_character(),
##   ..   fireworks = col_character(),
##   ..   bobblehead = col_character()
##   .. )
##  - attr(*, "problems")=<pointer: 0x5c74e03f8070>
head(DodgersData)
## # A tibble: 6 × 12
##   month   day attend day_of_week opponent  temp skies  day_night cap   shirt
##   <chr> <dbl>  <dbl> <chr>       <chr>    <dbl> <chr>  <chr>     <chr> <chr>
## 1 APR      10  56000 Tuesday     Pirates     67 Clear  Day       NO    NO   
## 2 APR      11  29729 Wednesday   Pirates     58 Cloudy Night     NO    NO   
## 3 APR      12  28328 Thursday    Pirates     57 Cloudy Night     NO    NO   
## 4 APR      13  31601 Friday      Padres      54 Cloudy Night     NO    NO   
## 5 APR      14  46549 Saturday    Padres      57 Cloudy Night     NO    NO   
## 6 APR      15  38359 Sunday      Padres      65 Clear  Day       NO    NO   
## # ℹ 2 more variables: fireworks <chr>, bobblehead <chr>
levels(DodgersData$month)
## NULL
head(DodgersData, 10)
## # A tibble: 10 × 12
##    month   day attend day_of_week opponent   temp skies  day_night cap   shirt
##    <chr> <dbl>  <dbl> <chr>       <chr>     <dbl> <chr>  <chr>     <chr> <chr>
##  1 APR      10  56000 Tuesday     Pirates      67 Clear  Day       NO    NO   
##  2 APR      11  29729 Wednesday   Pirates      58 Cloudy Night     NO    NO   
##  3 APR      12  28328 Thursday    Pirates      57 Cloudy Night     NO    NO   
##  4 APR      13  31601 Friday      Padres       54 Cloudy Night     NO    NO   
##  5 APR      14  46549 Saturday    Padres       57 Cloudy Night     NO    NO   
##  6 APR      15  38359 Sunday      Padres       65 Clear  Day       NO    NO   
##  7 APR      23  26376 Monday      Braves       60 Cloudy Night     NO    NO   
##  8 APR      24  44014 Tuesday     Braves       63 Cloudy Night     NO    NO   
##  9 APR      25  26345 Wednesday   Braves       64 Cloudy Night     NO    NO   
## 10 APR      27  44807 Friday      Nationals    66 Clear  Night     NO    NO   
## # ℹ 2 more variables: fireworks <chr>, bobblehead <chr>

Let R identify the temperature, the attendance, the opponent, and the promotion for the 20th home game of the season.

DodgersData[20, c("temp", "attend", "opponent", "bobblehead")]
## # A tibble: 1 × 4
##    temp attend opponent bobblehead
##   <dbl>  <dbl> <chr>    <chr>     
## 1    70  47077 Snakes   YES

Let R identify the average value for attendance.

meanattend <- mean(DodgersData$attend)
meanattend
## [1] 41040.07

Let R identify the number of promotions

promotions <- sum(DodgersData$bobblehead == "YES")
promotions
## [1] 11

Exploratory analysis

The results show that in 2012 there were a few promotions (see the last four columns).

Cap
Shirt
Fireworks
Bobblehead

We have data from April to October for games played in the Day or Night under Clear or Cloudy Skies.

Dodger Stadium has a capacity of about 56,000. Looking at the entire sample data shows that the stadium filled up only twice in 2012. There were only two cap promotions and three shirt promotions, which is not enough data for any inferences. Fireworks and Bobblehead promotions occurred more frequently.

Furthermore, there were eleven bobblehead promotions, with most of them occurring on Tuesday nights.

Evaluate Attendance by Weather

ggplot(DodgersData, aes(x = temp, y = attend/1000, color = fireworks)) +
  geom_point() +
  facet_wrap(day_night ~ skies) +
  ggtitle("Dodgers Attendance By Temperature By Time of Game and Skies") +
  theme(plot.title = element_text(
    lineheight = 3,
    face = "bold",
    color = "black",
    size = 10
  )) +
  xlab("Temperature (Degree Fahrenheit)") +
  ylab("Attendance (Thousands)")

Strip Plot of Attendance by Opponent or Visiting Team

ggplot(DodgersData,
       aes(x = attend/1000,
           y = opponent,
           color = day_night)) +
  geom_point() +
  ggtitle("Dodgers Attendance By Opponent") +
  theme(plot.title = element_text(
    lineheight = 3,
    face = "bold",
    color = "black",
    size = 10
  )) +
  xlab("Attendance (Thousands)") +
  ylab("Opponent (Visiting Team)")

Design Predictive Model - Original Model

To advise management if promotions impact attendance, we will need to identify whether there is a positive effect and, if there is, how much of an effect.

my.model <- attend ~ month + day_of_week + bobblehead

my.model.fit <- lm(my.model, data = DodgersData)

print(summary(my.model.fit))
## 
## Call:
## lm(formula = my.model, data = DodgersData)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10786.5  -3628.1   -516.1   2230.2  14351.0 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          38792.98    2364.68  16.405  < 2e-16 ***
## monthAUG              2377.92    2402.91   0.990   0.3259    
## monthJUL              2849.83    2578.60   1.105   0.2730    
## monthJUN              7163.23    2732.72   2.621   0.0108 *  
## monthMAY             -2385.62    2291.22  -1.041   0.3015    
## monthOCT              -662.67    4046.45  -0.164   0.8704    
## monthSEP                29.03    2521.25   0.012   0.9908    
## day_of_weekMonday    -4883.82    2504.65  -1.950   0.0554 .  
## day_of_weekSaturday   1488.24    2442.68   0.609   0.5444    
## day_of_weekSunday     1840.18    2426.79   0.758   0.4509    
## day_of_weekThursday  -4108.45    3381.22  -1.215   0.2286    
## day_of_weekTuesday    3027.68    2686.43   1.127   0.2638    
## day_of_weekWednesday -2423.80    2485.46  -0.975   0.3330    
## bobbleheadYES        10714.90    2419.52   4.429 3.59e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6120 on 67 degrees of freedom
## Multiple R-squared:  0.5444, Adjusted R-squared:  0.456 
## F-statistic: 6.158 on 13 and 67 DF,  p-value: 2.083e-07

Re-level Factors for Interpretability

DodgersData$month <- factor(
  DodgersData$month,
  levels = c("APR","MAY","JUN","JUL","AUG","SEP","OCT")
)

DodgersData$day_of_week <- factor(
  DodgersData$day_of_week,
  levels = c(
    "Monday",
    "Tuesday",
    "Wednesday",
    "Thursday",
    "Friday",
    "Saturday",
    "Sunday"
  )
)

my.model2 <- attend ~ month + day_of_week + bobblehead

my.model.fit2 <- lm(my.model2, data = DodgersData)

print(summary(my.model.fit2))
## 
## Call:
## lm(formula = my.model2, data = DodgersData)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10786.5  -3628.1   -516.1   2230.2  14351.0 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          33909.16    2521.81  13.446  < 2e-16 ***
## monthMAY             -2385.62    2291.22  -1.041  0.30152    
## monthJUN              7163.23    2732.72   2.621  0.01083 *  
## monthJUL              2849.83    2578.60   1.105  0.27303    
## monthAUG              2377.92    2402.91   0.990  0.32593    
## monthSEP                29.03    2521.25   0.012  0.99085    
## monthOCT              -662.67    4046.45  -0.164  0.87041    
## day_of_weekTuesday    7911.49    2702.21   2.928  0.00466 ** 
## day_of_weekWednesday  2460.02    2514.03   0.979  0.33134    
## day_of_weekThursday    775.36    3486.15   0.222  0.82467    
## day_of_weekFriday     4883.82    2504.65   1.950  0.05537 .  
## day_of_weekSaturday   6372.06    2552.08   2.497  0.01500 *  
## day_of_weekSunday     6724.00    2506.72   2.682  0.00920 ** 
## bobbleheadYES        10714.90    2419.52   4.429 3.59e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6120 on 67 degrees of freedom
## Multiple R-squared:  0.5444, Adjusted R-squared:  0.456 
## F-statistic: 6.158 on 13 and 67 DF,  p-value: 2.083e-07

Add Fireworks as a Second Promotion Variable

my.model.fireworks <- attend ~ month + day_of_week + bobblehead + fireworks

my.model.fireworks.fit <- lm(
  my.model.fireworks,
  data = DodgersData
)

print(summary(my.model.fireworks.fit))
## 
## Call:
## lm(formula = my.model.fireworks, data = DodgersData)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -9504  -3683   -709   2569  15390 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           34321.68    2418.90  14.189  < 2e-16 ***
## monthMAY              -2492.79    2193.60  -1.136  0.25990    
## monthJUN               7062.62    2616.13   2.700  0.00881 ** 
## monthJUL               1315.38    2534.42   0.519  0.60549    
## monthAUG               2377.88    2300.15   1.034  0.30501    
## monthSEP                -55.37    2413.63  -0.023  0.98177    
## monthOCT               -502.88    3873.86  -0.130  0.89711    
## day_of_weekTuesday     7750.46    2587.35   2.996  0.00386 ** 
## day_of_weekWednesday    904.16    2476.14   0.365  0.71617    
## day_of_weekThursday     309.24    3341.63   0.093  0.92655    
## day_of_weekFriday    -12386.23    6901.86  -1.795  0.07729 .  
## day_of_weekSaturday    6094.17    2445.16   2.492  0.01521 *  
## day_of_weekSunday      6577.96    2400.14   2.741  0.00788 ** 
## bobbleheadYES         10995.05    2318.43   4.742 1.17e-05 ***
## fireworksYES          17028.78    6381.63   2.668  0.00958 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5858 on 66 degrees of freedom
## Multiple R-squared:  0.5887, Adjusted R-squared:  0.5015 
## F-statistic: 6.749 on 14 and 66 DF,  p-value: 2.848e-08

Compare Models

anova(my.model.fit2, my.model.fireworks.fit)
## Analysis of Variance Table
## 
## Model 1: attend ~ month + day_of_week + bobblehead
## Model 2: attend ~ month + day_of_week + bobblehead + fireworks
##   Res.Df        RSS Df Sum of Sq      F   Pr(>F)   
## 1     67 2509574563                                
## 2     66 2265194779  1 244379784 7.1204 0.009581 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Do Fireworks Matter?

coef(summary(my.model.fireworks.fit))
##                          Estimate Std. Error     t value     Pr(>|t|)
## (Intercept)           34321.67542   2418.904 14.18893933 1.006038e-21
## monthMAY              -2492.79148   2193.598 -1.13639391 2.599012e-01
## monthJUN               7062.61771   2616.126  2.69964766 8.807976e-03
## monthJUL               1315.38384   2534.422  0.51900741 6.054916e-01
## monthAUG               2377.87744   2300.152  1.03379143 3.050069e-01
## monthSEP                -55.37149   2413.633 -0.02294114 9.817664e-01
## monthOCT               -502.88321   3873.864 -0.12981436 8.971081e-01
## day_of_weekTuesday     7750.46208   2587.349  2.99552219 3.855694e-03
## day_of_weekWednesday    904.16129   2476.142  0.36514921 7.161676e-01
## day_of_weekThursday     309.23698   3341.635  0.09254063 9.265488e-01
## day_of_weekFriday    -12386.23442   6901.856 -1.79462365 7.729303e-02
## day_of_weekSaturday    6094.17142   2445.160  2.49234026 1.520883e-02
## day_of_weekSunday      6577.96271   2400.143  2.74065464 7.880343e-03
## bobbleheadYES         10995.05292   2318.426  4.74246546 1.169915e-05
## fireworksYES          17028.77945   6381.631  2.66840552 9.580542e-03

Interpretation:

Bonus: Predict Attendance for a Saturday Afternoon in July with No Promotion

new_game <- data.frame(
  month = factor("JUL",
                 levels = levels(DodgersData$month)),
  day_of_week = factor(
    "Saturday",
    levels = levels(DodgersData$day_of_week)
  ),
  bobblehead = "NO",
  fireworks = "NO"
)

predict(
  my.model.fireworks.fit,
  newdata = new_game,
  interval = "prediction"
)
##        fit      lwr      upr
## 1 41731.23 28987.54 54474.92

Business Recommendation

Based on the regression results, bobblehead promotions have a strong positive effect on attendance at Dodgers home games. After controlling for month and day of week, games with bobblehead promotions attract significantly more fans than games without promotions. Management should continue investing in bobblehead promotions because they appear to be an effective strategy for increasing attendance and generating additional revenue.