11/20/2019

Introduction

Colleges are growing increasingly concerned with the declining numbers in attendance. In all but two of the last ten years, college football attendance has declined. This is true even while college football has enjoyed record popularity and revenue with more people watching it than ever. This trend worries schools’ athletic departments. Many schools have opted to shrink rather than grow their stadiums due to the shrinking stadium attendance.

The Data

The data was created through a couple of sources including:

  • Scraping Wikipedia for game results, attendance, team, kickoff time, etc.
  • Merging weather data for each of the games
  • Imputation of factors

Variables included:

Variable Description
Date The Date the Game was played on
Team Home team of the Football Game
Time Kickoff Time
Opponent Away team of the Football Game
Rank Rank of the Home Team for the AP poll
Site Location for the game
TV TV channel that the game was played on
Result The outcome for the football game
Attendance How many people attended the game

Variables included(cont.)

Variable Description
Current Wins How many wins the team has leading up to the game
Current Losses How many losses the team has leading up to the game
Stadium Capacity How many people fit into the stadium
Fill Rate Attendance / Stadium Capacity
New Coach If the team has a first year head coach
Tailgating If the team is a Top 25 tailgate destination
PRCP Precipitation
SNOW Snowfall
SNWD Snow Depth (Snow on ground)

Variables included(cont.)

Variable Description
TMAX Max Temperature for the day
TMIN Min Temperature for the day
Opponent_Rank Rank of the Opponent at the time of the game
NumericDate Date as an integer
NumericTime Time as an integer
Conference What football conference the team currently belongs

Fill Rate from 2000-2018, by Conference

Fill Rate from 2000-2018, by Conference

Conceptual Model

Simply thinking about the sport of football would say that weather would have a big impact on whether or not people go to the game. When it is warm more people would go than when it is cold. Next, both how good the team is and how good their opponent is would play an even more important role in determining how many attend the game. More people will go to support a good team then a bad team, and more people will go to see a good matchup than a bad one.

Moreover, other factors that might play a role in how many people attend a football game. For example, win or lose, some schools have long traditions of tailgating outside the stands, teams like Ole Miss will have upwards of 100,000 fans tailgating outside a stadium that holds 60,000. This will lead to higher attendance numbers for the stadium. Furthermore, teams tend to experience a boost in attendance after hiring a new head coach.

Morning games would also have fewer fans in attendance than afternoon and evening games, this would be due to having less time to tailgate, drive to the game and experience the gameday atmosphere.

Basic Linear Model

Likelihood

Fill Rate ~ Normal( \(\mu\), \(\sigma\))

Priors

\(\mu\) ~ Normal(.9, 1)
\(\sigma\) ~ Uniform(0, 2)

Which translates to:

m1 <- quap(
  alist(
    `Fill Rate` ~ dnorm(mu, sigma),
    mu ~ dlnorm(.9, .25),
    sigma ~ dunif(0,.25)
  ), data = CFB
)

Prior Predictive Check

set.seed(2020)
#simulate values for mu from the prior
m1_mu <- rnorm(100, .9, .25)
# simulate values for sigma from the prior
m1_sigma <- runif(100, min = 0, max = .25)
prior_m1 <- rnorm(100, m1_mu, m1_sigma)
ggplot(,aes(x =prior_m1)) +
  geom_density() + 
  xlab("Fill Rate")

Posterior Predictive Check

Interpreting the Model

Before adding any data to the model, the model can be improved.

precis(m1)
##            mean          sd      5.5%     94.5%
## mu    0.8792294 0.002851954 0.8746714 0.8837873
## sigma 0.1735062 0.002016748 0.1702830 0.1767293

Comparing it to the actual data

precis(CFB$`Fill Rate`)
##                      mean       sd      5.5%    94.5%
## CFB..Fill.Rate. 0.8790862 0.173525 0.5456084 1.049836
##                                                                                                                                histogram
## CFB..Fill.Rate. <U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2582><U+2582><U+2583><U+2587><U+2587><U+2581><U+2581><U+2581><U+2581>

Iteration 2

In this model, I will be including all the continuous values in the model. This model will include many of the predictor variables such as the weather.

Likelihood

Fill Rate ~ Normal(\(\mu\), \(\sigma\))
mu = \(\alpha\)_i * team_i + \(\beta\)W * Wins + \(\beta\)L * Losses + \(\beta\)H * MaxTemp + \(\beta\)Low * MinTemp + \(\beta\)Month * Month + \(\beta\)Rain * PRCP + \(\beta\)Snow * SNOW + \(\beta\)SNWD * SNWD

Priors
\(\alpha\)i ~ Normal(0,1), for i = 1:34
\(\beta\)W ~ Normal(0,1)
\(\beta\)L ~ Normal(0,1)
\(\beta\)H ~ Normal(0,1)
\(\beta\)Low ~ Normal(0,1)
\(\beta\)Month Normal(10, 0.75)
\(\beta\)PRCP ~ Normal(0,1)
\(\beta\)SNOW ~ Normal(0,1)
\(\beta\)SNWD ~ Normal(0,1)
\(\sigma\) ~ Uniform(0, .25)

Initial Model

This model translates to:

m2 <- quap(
  alist(
    `Fill Rate` ~ dnorm(mu, sigma),
    mu <- alpha[Team_index] + betaW * wins_std + betaL * losses_std + betaH * TMAX_std + betaLow * TMIN_std + betaMonth * Month + betaPRCP * PRCP_std + betaSNOW * SNOW_std + betaSNWD * SNWD_std,
    alpha[Team_index] ~ dlnorm(0, 1),
    betaW ~ dnorm(0,1),
    betaL ~ dnorm(0,1),
    betaH ~ dnorm(0,1),
    betaLow ~ dnorm(0,1),
    betaMonth ~ dnorm(0,.75),
    betaPRCP ~ dnorm(0,1),
    betaSNOW ~ dnorm(0,1),
    betaSNWD ~ dnorm(0,1),
    sigma ~ dunif(0,.25)
  ), data = CFB
)

Prior Predictive Check

Currently, the model does not know the affects of each of the beta values on the model.

Second Attempt

In order to fit the model, I calibrate the affects of the model one part at a time.

m3 <- quap(
  alist(
    `Fill Rate` ~ dnorm(mu, sigma),
    mu <- alpha[Team_index] + betaW * wins_std + betaL * losses_std + betaH * TMAX_std + betaLow * TMIN_std + betaMonth * Month + betaPRCP * PRCP_std + betaSNOW * SNOW_std + betaSNWD * SNWD_std,
    alpha[Team_index] ~ dlnorm(-0.1, 0.1),
    betaW ~ dnorm(0, 0.05),
    betaL ~ dnorm(0, 0.05),
    betaH ~ dnorm(0, 0.05),
    betaLow ~ dnorm(0,0.05),
    betaMonth ~ dnorm(0, 0.005),
    betaPRCP ~ dnorm(0, 0.05),
    betaSNOW ~ dnorm(0.01,0.005),
    betaSNWD ~ dnorm(0,0.05),
    sigma ~ dunif(0,.25)
  ), data = CFB
)

Prior Predictive Check

# extract priors
prior_m3 <- extract.prior(m3)
prior_m3$mu <- link(m3, post = prior_m3)

#Simulate data using the prior values
m3ppc <- tibble(
  `Fill Rate` = rnorm(
    nrow(CFB) * length(prior_m3$sigma),
    mean = prior_m3$mu,
    sd = prior_m3$sigma
  )
)
# Plot the prior predictive distribution
m3ppc %>%
  ggplot(aes(x = `Fill Rate`)) +
  geom_density() + 
  xlim(0,2) +
  ggtitle("Model 3 - PPC")

Model Calibration

Now with our model, we will evaluated the model against the real data again.

Model Evaluation

Adding the covariates has improved the model but still has not captured the structure of the data. Right now the model has much too large of a right tail and the drop off at capacity has not been accounted for.

## 33 vector or matrix parameters hidden. Use depth=2 to show them.
##                   mean          sd         5.5%         94.5%
## betaW      0.020297194 0.002777820  0.015857702  0.0247366871
## betaL     -0.035910482 0.002526366 -0.039948103 -0.0318728611
## betaH      0.011970652 0.003910652  0.005720675  0.0182206293
## betaLow   -0.009134455 0.004022248 -0.015562785 -0.0027061257
## betaMonth -0.004342188 0.001437298 -0.006639267 -0.0020451084
## betaPRCP  -0.001880691 0.001910167 -0.004933507  0.0011721252
## betaSNOW   0.005243840 0.002025717  0.002006353  0.0084813262
## betaSNWD  -0.002991708 0.002168851 -0.006457952  0.0004745348
## sigma      0.111403894 0.001295210  0.109333898  0.1134738909

Next Steps

The current model does slightly better than the previous models, but still doesn’t capture model well. Here are ways to improve the model

  • Include a more hierarchical structure to the Data
  • Include categorical variables