It is tough to make good predictions. The numerous factors or variables, independent and dependent, involved in many sporting events contribute to the unpredictability. However, using carefully-selected variables, it is still possible to make marketing promotions more accountable.
The goal of this case study is to analyze if bobblehead promotions increase attendance at Dodgers home games. Using the fitted predictive model, we can predict the attendance for the game in the forthcoming season and we can predict the attendance with or without bobblehead promotion.
The motivation of this case study is to design a predictive model, and report any interesting findings to support critical business decision making.
Load the required libraries and the data.
library(lattice)
library(ggplot2)
library(readr)
DodgersData <- read_csv(
"https://raw.githubusercontent.com/utjimmyx/regression/master/DodgersData.csv"
)
## `curl` package not installed, falling back to using `url()`
## Rows: 81 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (9): month, day_of_week, opponent, skies, day_night, cap, shirt, firewor...
## dbl (3): day, attend, temp
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Evaluate the Structure and Re-Level the factor variables for Day Of Week and Month.
str(DodgersData)
## spc_tbl_ [81 × 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ month : chr [1:81] "APR" "APR" "APR" "APR" ...
## $ day : num [1:81] 10 11 12 13 14 15 23 24 25 27 ...
## $ attend : num [1:81] 56000 29729 28328 31601 46549 ...
## $ day_of_week: chr [1:81] "Tuesday" "Wednesday" "Thursday" "Friday" ...
## $ opponent : chr [1:81] "Pirates" "Pirates" "Pirates" "Padres" ...
## $ temp : num [1:81] 67 58 57 54 57 65 60 63 64 66 ...
## $ skies : chr [1:81] "Clear" "Cloudy" "Cloudy" "Cloudy" ...
## $ day_night : chr [1:81] "Day" "Night" "Night" "Night" ...
## $ cap : chr [1:81] "NO" "NO" "NO" "NO" ...
## $ shirt : chr [1:81] "NO" "NO" "NO" "NO" ...
## $ fireworks : chr [1:81] "NO" "NO" "NO" "YES" ...
## $ bobblehead : chr [1:81] "NO" "NO" "NO" "NO" ...
## - attr(*, "spec")=
## .. cols(
## .. month = col_character(),
## .. day = col_double(),
## .. attend = col_double(),
## .. day_of_week = col_character(),
## .. opponent = col_character(),
## .. temp = col_double(),
## .. skies = col_character(),
## .. day_night = col_character(),
## .. cap = col_character(),
## .. shirt = col_character(),
## .. fireworks = col_character(),
## .. bobblehead = col_character()
## .. )
## - attr(*, "problems")=<pointer: 0x5c74e03f8070>
head(DodgersData)
## # A tibble: 6 × 12
## month day attend day_of_week opponent temp skies day_night cap shirt
## <chr> <dbl> <dbl> <chr> <chr> <dbl> <chr> <chr> <chr> <chr>
## 1 APR 10 56000 Tuesday Pirates 67 Clear Day NO NO
## 2 APR 11 29729 Wednesday Pirates 58 Cloudy Night NO NO
## 3 APR 12 28328 Thursday Pirates 57 Cloudy Night NO NO
## 4 APR 13 31601 Friday Padres 54 Cloudy Night NO NO
## 5 APR 14 46549 Saturday Padres 57 Cloudy Night NO NO
## 6 APR 15 38359 Sunday Padres 65 Clear Day NO NO
## # ℹ 2 more variables: fireworks <chr>, bobblehead <chr>
levels(DodgersData$month)
## NULL
head(DodgersData, 10)
## # A tibble: 10 × 12
## month day attend day_of_week opponent temp skies day_night cap shirt
## <chr> <dbl> <dbl> <chr> <chr> <dbl> <chr> <chr> <chr> <chr>
## 1 APR 10 56000 Tuesday Pirates 67 Clear Day NO NO
## 2 APR 11 29729 Wednesday Pirates 58 Cloudy Night NO NO
## 3 APR 12 28328 Thursday Pirates 57 Cloudy Night NO NO
## 4 APR 13 31601 Friday Padres 54 Cloudy Night NO NO
## 5 APR 14 46549 Saturday Padres 57 Cloudy Night NO NO
## 6 APR 15 38359 Sunday Padres 65 Clear Day NO NO
## 7 APR 23 26376 Monday Braves 60 Cloudy Night NO NO
## 8 APR 24 44014 Tuesday Braves 63 Cloudy Night NO NO
## 9 APR 25 26345 Wednesday Braves 64 Cloudy Night NO NO
## 10 APR 27 44807 Friday Nationals 66 Clear Night NO NO
## # ℹ 2 more variables: fireworks <chr>, bobblehead <chr>
DodgersData[20, c("temp", "attend", "opponent", "bobblehead")]
## # A tibble: 1 × 4
## temp attend opponent bobblehead
## <dbl> <dbl> <chr> <chr>
## 1 70 47077 Snakes YES
meanattend <- mean(DodgersData$attend)
meanattend
## [1] 41040.07
promotions <- sum(DodgersData$bobblehead == "YES")
promotions
## [1] 11
The results show that in 2012 there were a few promotions (see the last four columns).
Cap
Shirt
Fireworks
Bobblehead
We have data from April to October for games played in the Day or Night under Clear or Cloudy Skies.
Dodger Stadium has a capacity of about 56,000. Looking at the entire sample data shows that the stadium filled up only twice in 2012. There were only two cap promotions and three shirt promotions, which is not enough data for any inferences. Fireworks and Bobblehead promotions occurred more frequently.
Furthermore, there were eleven bobblehead promotions, with most of them occurring on Tuesday nights.
ggplot(DodgersData, aes(x = temp, y = attend/1000, color = fireworks)) +
geom_point() +
facet_wrap(day_night ~ skies) +
ggtitle("Dodgers Attendance By Temperature By Time of Game and Skies") +
theme(plot.title = element_text(
lineheight = 3,
face = "bold",
color = "black",
size = 10
)) +
xlab("Temperature (Degree Fahrenheit)") +
ylab("Attendance (Thousands)")
ggplot(DodgersData,
aes(x = attend/1000,
y = opponent,
color = day_night)) +
geom_point() +
ggtitle("Dodgers Attendance By Opponent") +
theme(plot.title = element_text(
lineheight = 3,
face = "bold",
color = "black",
size = 10
)) +
xlab("Attendance (Thousands)") +
ylab("Opponent (Visiting Team)")
To advise management if promotions impact attendance, we will need to identify whether there is a positive effect and, if there is, how much of an effect.
my.model <- attend ~ month + day_of_week + bobblehead
my.model.fit <- lm(my.model, data = DodgersData)
print(summary(my.model.fit))
##
## Call:
## lm(formula = my.model, data = DodgersData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10786.5 -3628.1 -516.1 2230.2 14351.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38792.98 2364.68 16.405 < 2e-16 ***
## monthAUG 2377.92 2402.91 0.990 0.3259
## monthJUL 2849.83 2578.60 1.105 0.2730
## monthJUN 7163.23 2732.72 2.621 0.0108 *
## monthMAY -2385.62 2291.22 -1.041 0.3015
## monthOCT -662.67 4046.45 -0.164 0.8704
## monthSEP 29.03 2521.25 0.012 0.9908
## day_of_weekMonday -4883.82 2504.65 -1.950 0.0554 .
## day_of_weekSaturday 1488.24 2442.68 0.609 0.5444
## day_of_weekSunday 1840.18 2426.79 0.758 0.4509
## day_of_weekThursday -4108.45 3381.22 -1.215 0.2286
## day_of_weekTuesday 3027.68 2686.43 1.127 0.2638
## day_of_weekWednesday -2423.80 2485.46 -0.975 0.3330
## bobbleheadYES 10714.90 2419.52 4.429 3.59e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6120 on 67 degrees of freedom
## Multiple R-squared: 0.5444, Adjusted R-squared: 0.456
## F-statistic: 6.158 on 13 and 67 DF, p-value: 2.083e-07
DodgersData$month <- factor(
DodgersData$month,
levels = c("APR","MAY","JUN","JUL","AUG","SEP","OCT")
)
DodgersData$day_of_week <- factor(
DodgersData$day_of_week,
levels = c(
"Monday",
"Tuesday",
"Wednesday",
"Thursday",
"Friday",
"Saturday",
"Sunday"
)
)
my.model2 <- attend ~ month + day_of_week + bobblehead
my.model.fit2 <- lm(my.model2, data = DodgersData)
print(summary(my.model.fit2))
##
## Call:
## lm(formula = my.model2, data = DodgersData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10786.5 -3628.1 -516.1 2230.2 14351.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33909.16 2521.81 13.446 < 2e-16 ***
## monthMAY -2385.62 2291.22 -1.041 0.30152
## monthJUN 7163.23 2732.72 2.621 0.01083 *
## monthJUL 2849.83 2578.60 1.105 0.27303
## monthAUG 2377.92 2402.91 0.990 0.32593
## monthSEP 29.03 2521.25 0.012 0.99085
## monthOCT -662.67 4046.45 -0.164 0.87041
## day_of_weekTuesday 7911.49 2702.21 2.928 0.00466 **
## day_of_weekWednesday 2460.02 2514.03 0.979 0.33134
## day_of_weekThursday 775.36 3486.15 0.222 0.82467
## day_of_weekFriday 4883.82 2504.65 1.950 0.05537 .
## day_of_weekSaturday 6372.06 2552.08 2.497 0.01500 *
## day_of_weekSunday 6724.00 2506.72 2.682 0.00920 **
## bobbleheadYES 10714.90 2419.52 4.429 3.59e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6120 on 67 degrees of freedom
## Multiple R-squared: 0.5444, Adjusted R-squared: 0.456
## F-statistic: 6.158 on 13 and 67 DF, p-value: 2.083e-07
my.model.fireworks <- attend ~ month + day_of_week + bobblehead + fireworks
my.model.fireworks.fit <- lm(
my.model.fireworks,
data = DodgersData
)
print(summary(my.model.fireworks.fit))
##
## Call:
## lm(formula = my.model.fireworks, data = DodgersData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9504 -3683 -709 2569 15390
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34321.68 2418.90 14.189 < 2e-16 ***
## monthMAY -2492.79 2193.60 -1.136 0.25990
## monthJUN 7062.62 2616.13 2.700 0.00881 **
## monthJUL 1315.38 2534.42 0.519 0.60549
## monthAUG 2377.88 2300.15 1.034 0.30501
## monthSEP -55.37 2413.63 -0.023 0.98177
## monthOCT -502.88 3873.86 -0.130 0.89711
## day_of_weekTuesday 7750.46 2587.35 2.996 0.00386 **
## day_of_weekWednesday 904.16 2476.14 0.365 0.71617
## day_of_weekThursday 309.24 3341.63 0.093 0.92655
## day_of_weekFriday -12386.23 6901.86 -1.795 0.07729 .
## day_of_weekSaturday 6094.17 2445.16 2.492 0.01521 *
## day_of_weekSunday 6577.96 2400.14 2.741 0.00788 **
## bobbleheadYES 10995.05 2318.43 4.742 1.17e-05 ***
## fireworksYES 17028.78 6381.63 2.668 0.00958 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5858 on 66 degrees of freedom
## Multiple R-squared: 0.5887, Adjusted R-squared: 0.5015
## F-statistic: 6.749 on 14 and 66 DF, p-value: 2.848e-08
anova(my.model.fit2, my.model.fireworks.fit)
## Analysis of Variance Table
##
## Model 1: attend ~ month + day_of_week + bobblehead
## Model 2: attend ~ month + day_of_week + bobblehead + fireworks
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 67 2509574563
## 2 66 2265194779 1 244379784 7.1204 0.009581 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coef(summary(my.model.fireworks.fit))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34321.67542 2418.904 14.18893933 1.006038e-21
## monthMAY -2492.79148 2193.598 -1.13639391 2.599012e-01
## monthJUN 7062.61771 2616.126 2.69964766 8.807976e-03
## monthJUL 1315.38384 2534.422 0.51900741 6.054916e-01
## monthAUG 2377.87744 2300.152 1.03379143 3.050069e-01
## monthSEP -55.37149 2413.633 -0.02294114 9.817664e-01
## monthOCT -502.88321 3873.864 -0.12981436 8.971081e-01
## day_of_weekTuesday 7750.46208 2587.349 2.99552219 3.855694e-03
## day_of_weekWednesday 904.16129 2476.142 0.36514921 7.161676e-01
## day_of_weekThursday 309.23698 3341.635 0.09254063 9.265488e-01
## day_of_weekFriday -12386.23442 6901.856 -1.79462365 7.729303e-02
## day_of_weekSaturday 6094.17142 2445.160 2.49234026 1.520883e-02
## day_of_weekSunday 6577.96271 2400.143 2.74065464 7.880343e-03
## bobbleheadYES 10995.05292 2318.426 4.74246546 1.169915e-05
## fireworksYES 17028.77945 6381.631 2.66840552 9.580542e-03
Interpretation:
new_game <- data.frame(
month = factor("JUL",
levels = levels(DodgersData$month)),
day_of_week = factor(
"Saturday",
levels = levels(DodgersData$day_of_week)
),
bobblehead = "NO",
fireworks = "NO"
)
predict(
my.model.fireworks.fit,
newdata = new_game,
interval = "prediction"
)
## fit lwr upr
## 1 41731.23 28987.54 54474.92
Based on the regression results, bobblehead promotions have a strong positive effect on attendance at Dodgers home games. After controlling for month and day of week, games with bobblehead promotions attract significantly more fans than games without promotions. Management should continue investing in bobblehead promotions because they appear to be an effective strategy for increasing attendance and generating additional revenue.