STA6257_Project_LMMs

Introduction

Linear Mixed-Effects Models can be used to model correlated data
- Take the form of cross sectional or longitudinal data
Called “mixed” because they simultaneously model fixed and random effects
- fixed effects model average trends
- random effects model the extent to which these trends vary across levels of some grouping factor
Main application for mixed-effect models is in psychology due to the nature of their data and repeated observations across trial participants

Methods

lme4 package has become the predominant tool in the R language for fitting linear mixed-effect models
a linear mixed model is described by the distribution of two vector-valued random variables: \(Y\), the response, and \(\beta\), the vector of random effects
Our basic model: \[epa = plays + (1+plays|coach)\]

Data

Our data featured a play by play analysis for every game in the 2021 season.

Variable	Meaning
pbp.posteam	the team with possession of the ball (offense)
pop.posteam_type	specifies if the possessing team is home or away
pbp.game_id	the specific game id from the NFL
week	week number in the season that the game was played
season_type	flag that specifies if it is a regular (0) or post (1) season game
home_adv	flag that specifies home (0) or away(1)
coach	the coach of the team with possession (offensive plays)
opp_coach	the coach of the opposing team (defensive plays)
plays	total number of rush and pass plays given the team and game
pass_plays	number of pass plays given the team and game
pass_pct	the percentage of pass plays in the game calculated by pass_plays/plays
yards_gained	yards gained by an offense
shotgun_snaps	number of snaps a team lined up in a shotgun formation
ho_huddle_snaps	number of snaps a team used a no huddle offense
EPA_per_play	the mean of all pass and rush plays given team and game

# Compare EPA by coach
ggplot(team_sum2, aes(epa_per_play)) +
  geom_boxplot() +
  facet_wrap(~ coach) +
  theme_minimal()

Code

# LMM - random intercepts
epa.lmer1 = lmer(epa_per_play ~ pass_pct + plays + yards_gained + no_huddle_snap + 
(1|coach) + (1|home_adv), data=team_sum2)

epa.lmer1

Linear mixed model fit by REML ['lmerModLmerTest']
Formula: epa_per_play ~ pass_pct + plays + yards_gained + no_huddle_snap +  
    (1 | coach) + (1 | home_adv)
   Data: team_sum2
REML criterion at convergence: -657.5259
Random effects:
 Groups   Name        Std.Dev.
 coach    (Intercept) 0.02138 
 home_adv (Intercept) 0.00000 
 Residual             0.12358 
Number of obs: 536, groups:  coach, 34; home_adv, 2
Fixed Effects:
   (Intercept)        pass_pct           plays    yards_gained  no_huddle_snap  
     -0.065738       -0.360035       -0.005792        0.001993       -0.002185  
optimizer (nloptwrap) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings

-home advantage random effect approaches zero - no additional change in EPA per play due to home field advantage -additional 0.02 EPA per play due to coaching

# LMM - random slopes
epa.lmer2 = lmer(epa_per_play ~ pass_pct + plays + yards_gained + shotgun_snaps + no_huddle_snap +
(1+pass_pct|coach) + (1+plays|coach) + (1+yards_gained|coach) + (1+shotgun_snaps|coach) + (1+no_huddle_snap|coach), data=team_sum2)

different baseline levels of plays ran and pass_pct
- all the values are negative and very close to each other
there’s consistency with how often coaches throw the ball -variation in number of shotgun snaps an offense runs is much wider

# Testing for significance between models with and without home field advantage
epa.lmer2.null = lmer(epa_per_play ~ pass_pct + plays + yards_gained + shotgun_snaps + no_huddle_snap +
                  (1+pass_pct|coach) + (1+plays|coach) + (1+yards_gained|coach) + (1+shotgun_snaps|coach) + (1+no_huddle_snap|coach),                                            data=team_sum2,
                  REML=FALSE)

epa.lmer2.full = lmer(epa_per_play ~ home_adv + pass_pct + plays + yards_gained + shotgun_snaps + no_huddle_snap +
                  (1+home_adv|coach) + (1+pass_pct|coach) + (1+plays|coach) + (1+yards_gained|coach) + (1+shotgun_snaps|coach) + (1+no_huddle_snap|coach),                       data=team_sum2,
                  REML=FALSE)

anova(epa.lmer2.full, epa.lmer2.null)

Data: team_sum2
Models:
epa.lmer2.null: epa_per_play ~ pass_pct + plays + yards_gained + shotgun_snaps + no_huddle_snap + (1 + pass_pct | coach) + (1 + plays | coach) + (1 + yards_gained | coach) + (1 + shotgun_snaps | coach) + (1 + no_huddle_snap | coach)
epa.lmer2.full: epa_per_play ~ home_adv + pass_pct + plays + yards_gained + shotgun_snaps + no_huddle_snap + (1 + home_adv | coach) + (1 + pass_pct | coach) + (1 + plays | coach) + (1 + yards_gained | coach) + (1 + shotgun_snaps | coach) + (1 + no_huddle_snap | coach)
               npar     AIC     BIC logLik deviance  Chisq Df Pr(>Chisq)
epa.lmer2.null   22 -668.99 -574.74 356.50  -712.99                     
epa.lmer2.full   26 -663.43 -552.04 357.72  -715.43 2.4394  4     0.6555

no statistical significance between the models -the effect of home field advantage is minimal to zero

Conclusion

After accounting for the fixed effects plays ran, percentage of pass plays, yards gained, shotgun snaps, and no huddle snaps, our random effect coefficient for coaching showed an additional change 0.02 EPA per play due to coaching and no change in EPA per play due to home field advantage.