STA6257_Project_LMMs

Introduction

  • Linear Mixed-Effects Models can be used to model correlated data
    • Take the form of cross sectional or longitudinal data
  • Called “mixed” because they simultaneously model fixed and random effects
    • fixed effects model average trends
    • random effects model the extent to which these trends vary across levels of some grouping factor
  • Main application for mixed-effect models is in psychology due to the nature of their data and repeated observations across trial participants

Methods

  • lme4 package has become the predominant tool in the R language for fitting linear mixed-effect models
  • a linear mixed model is described by the distribution of two vector-valued random variables: \(Y\), the response, and \(\beta\), the vector of random effects
  • Our basic model: \[epa = plays + (1+plays|coach)\]

Data

Our data featured a play by play analysis for every game in the 2021 season.

Variable Meaning
pbp.posteam the team with possession of the ball (offense)
pop.posteam_type specifies if the possessing team is home or away
pbp.game_id the specific game id from the NFL
week week number in the season that the game was played
season_type flag that specifies if it is a regular (0) or post (1) season game
home_adv flag that specifies home (0) or away(1)
coach the coach of the team with possession (offensive plays)
opp_coach the coach of the opposing team (defensive plays)
plays total number of rush and pass plays given the team and game
pass_plays number of pass plays given the team and game
pass_pct the percentage of pass plays in the game calculated by pass_plays/plays
yards_gained yards gained by an offense
shotgun_snaps number of snaps a team lined up in a shotgun formation
ho_huddle_snaps number of snaps a team used a no huddle offense
EPA_per_play the mean of all pass and rush plays given team and game
# Compare EPA by coach
ggplot(team_sum2, aes(epa_per_play)) +
  geom_boxplot() +
  facet_wrap(~ coach) +
  theme_minimal()

Code

# LMM - random intercepts
epa.lmer1 = lmer(epa_per_play ~ pass_pct + plays + yards_gained + no_huddle_snap + 
(1|coach) + (1|home_adv), data=team_sum2)

epa.lmer1
Linear mixed model fit by REML ['lmerModLmerTest']
Formula: epa_per_play ~ pass_pct + plays + yards_gained + no_huddle_snap +  
    (1 | coach) + (1 | home_adv)
   Data: team_sum2
REML criterion at convergence: -657.5259
Random effects:
 Groups   Name        Std.Dev.
 coach    (Intercept) 0.02138 
 home_adv (Intercept) 0.00000 
 Residual             0.12358 
Number of obs: 536, groups:  coach, 34; home_adv, 2
Fixed Effects:
   (Intercept)        pass_pct           plays    yards_gained  no_huddle_snap  
     -0.065738       -0.360035       -0.005792        0.001993       -0.002185  
optimizer (nloptwrap) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings 

-home advantage random effect approaches zero - no additional change in EPA per play due to home field advantage -additional 0.02 EPA per play due to coaching

# LMM - random slopes
epa.lmer2 = lmer(epa_per_play ~ pass_pct + plays + yards_gained + shotgun_snaps + no_huddle_snap +
(1+pass_pct|coach) + (1+plays|coach) + (1+yards_gained|coach) + (1+shotgun_snaps|coach) + (1+no_huddle_snap|coach), data=team_sum2)
  • different baseline levels of plays ran and pass_pct
    • all the values are negative and very close to each other
  • there’s consistency with how often coaches throw the ball -variation in number of shotgun snaps an offense runs is much wider
# Testing for significance between models with and without home field advantage
epa.lmer2.null = lmer(epa_per_play ~ pass_pct + plays + yards_gained + shotgun_snaps + no_huddle_snap +
                  (1+pass_pct|coach) + (1+plays|coach) + (1+yards_gained|coach) + (1+shotgun_snaps|coach) + (1+no_huddle_snap|coach),                                            data=team_sum2,
                  REML=FALSE)

epa.lmer2.full = lmer(epa_per_play ~ home_adv + pass_pct + plays + yards_gained + shotgun_snaps + no_huddle_snap +
                  (1+home_adv|coach) + (1+pass_pct|coach) + (1+plays|coach) + (1+yards_gained|coach) + (1+shotgun_snaps|coach) + (1+no_huddle_snap|coach),                       data=team_sum2,
                  REML=FALSE)

anova(epa.lmer2.full, epa.lmer2.null)
Data: team_sum2
Models:
epa.lmer2.null: epa_per_play ~ pass_pct + plays + yards_gained + shotgun_snaps + no_huddle_snap + (1 + pass_pct | coach) + (1 + plays | coach) + (1 + yards_gained | coach) + (1 + shotgun_snaps | coach) + (1 + no_huddle_snap | coach)
epa.lmer2.full: epa_per_play ~ home_adv + pass_pct + plays + yards_gained + shotgun_snaps + no_huddle_snap + (1 + home_adv | coach) + (1 + pass_pct | coach) + (1 + plays | coach) + (1 + yards_gained | coach) + (1 + shotgun_snaps | coach) + (1 + no_huddle_snap | coach)
               npar     AIC     BIC logLik deviance  Chisq Df Pr(>Chisq)
epa.lmer2.null   22 -668.99 -574.74 356.50  -712.99                     
epa.lmer2.full   26 -663.43 -552.04 357.72  -715.43 2.4394  4     0.6555
  • no statistical significance between the models -the effect of home field advantage is minimal to zero

Conclusion

After accounting for the fixed effects plays ran, percentage of pass plays, yards gained, shotgun snaps, and no huddle snaps, our random effect coefficient for coaching showed an additional change 0.02 EPA per play due to coaching and no change in EPA per play due to home field advantage.