Why use multi level modelling?

With this method, data can be clustered into different groups; with this example being EPL data with different teams. The idea is that a random effect is taken into account such as different teams taking into account the effect of different coaches’ tactics and playing styles.

EDA

overall Fifa rating score

To start we can assess our response variable being overall Fifa rating score. We can gain insight into the distribution of overall ratings

Skill Dribbling

Looking at the data of skill dribbling which we aim to use as our predictor variable

Correlation of overall and skill_dribbling

After filtering out some data we have this graph showing the correlation between skill ball control and overall rating

For the sake of this example we are only going to compare variables within the same league which will be the EPL

Now we can have a look at a facet wrapped linear model of each of the teams in the English Premier League. The relationship between skill_dribbling and overall fifa rating

Different models

Simple linear model

## 
## Call:
## lm(formula = overall ~ skill_dribbling, data = epl)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.8975  -3.9597  -0.2916   3.6592  20.6764 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     36.46882    1.74192   20.94   <2e-16 ***
## skill_dribbling  0.52463    0.02482   21.14   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.443 on 564 degrees of freedom
## Multiple R-squared:  0.4421, Adjusted R-squared:  0.4411 
## F-statistic: 446.9 on 1 and 564 DF,  p-value: < 2.2e-16

overall rating = 36.47 + 0.52(dribbling skill)

For every 1 unit increase in dribbling skill, overall rating increases by 0.52 points

Random intercepts

The purpose of the unconditional means model is to assess the amount of variation at each level—to compare variability between subjects. Our random intercept as club

Here we have a plot showing the intercepts of each team for overall rating - Fixed effect (Intercept) = 72.86 (initial baseline rating value)

  • Random effect (club) = 7.09 (clubs will vary around the intercept 72.86, on average by 7.09 units)

  • Chelsea has the largest intercept, 72.86 + 4.16 = 77.02

  • Brentford has the smallest intercept, 72.86 - 2.95 = 69.91

Adding ‘skill_dribbling’

- When dribbling skill is added to the model:

  • Man U has the largest intercepts (+1.26)

  • Wolverhampton has the smallest intercept (-1.38)

Random slopes and Intercepts model

For another model we can add dribbling skill as a random slope. By adding this random slope we can assess the effect of how a predictor variable may also vary across groups. By adding a random slope, you allow for these differences in the relationship between the predictor and the outcome across groups, which leads to a better model fit and more accurate estimation of effects.

Model comparison

Assessing R2

  overall overall overall overall overall
Predictors Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p
(Intercept) 36.47 33.05 – 39.89 <0.001 72.86 71.51 – 74.21 <0.001 37.48 34.00 – 40.95 <0.001 37.50 33.11 – 41.88 <0.001 60.49 60.12 – 60.86 <0.001
skill dribbling 0.52 0.48 – 0.57 <0.001 0.51 0.46 – 0.56 <0.001 0.51 0.45 – 0.57 <0.001 0.09 0.09 – 0.10 <0.001
Random Effects
σ2   67.55 40.31 39.45 25.33
τ00   7.09 club_name 1.30 club_name 38.39 club_name 12.07 league_name:club_name
        3.66 club_name
τ11       0.01 club_name.skill_dribbling  
ρ01       -0.98 club_name  
ICC   0.10 0.03 0.05 0.38
N   20 club_name 20 club_name 20 club_name 56 league_name
        702 club_name
Observations 566 566 566 566 19239
R2 / R2 adjusted 0.442 / 0.441 0.000 / 0.095 0.427 / 0.445 0.425 / 0.456 0.069 / 0.426

Assessing AIC

Name Model AIC AIC_wt AICc AICc_wt BIC BIC_wt RMSE Sigma R2_conditional R2_marginal ICC R2 R2_adjusted
LM lm 3719.230 0.2170648 3719.272 0.2217871 3732.245 0.7851841 6.432091 6.443485 NA NA NA 0.4420716 0.4410824
RI lmerMod 4023.303 0.0000000 4023.345 0.0000000 4036.318 0.0000000 8.107824 8.218994 0.0950210 0.0000000 0.0950210 NA NA
RI_SD lmerMod 3717.497 0.5162730 3717.568 0.5200159 3734.851 0.2133771 6.286890 6.349113 0.4451975 0.4272943 0.0312607 NA NA
RS_IM lmerMod 3718.818 0.2666621 3718.968 0.2581969 3744.850 0.0014388 6.181165 6.280851 0.4555591 0.4254237 0.0524479 NA NA
Without_RS lmerMod 118810.646 0.0000000 118810.649 0.0000000 118849.969 0.0000000 4.945197 5.032757 0.4255178 0.0689104 0.3830001 NA NA

RESULTS

R2 BEST MODEL: The LM had the best R2 with 0.442 (shows highest level of correlation) Linear models tend to have a better R² because they try to approximate the linear component of the relationship, but this does not mean they capture the true complexity of the data. Complex models might have lower R² simply because they do not fully rely on a linear approximation. Additionally, R² can be misleading when comparing non-nested models that differ in their form.

AIC BEST MODEL: RI_SD (Random intercepts with skill dribbling) showed to be the lowest AIC model. This refers to the estimated prediction error. So the lowest score is the best and therefore the best model for use

Conclusion:

Teams like Manchester United and Manchester City are significantly better compared to the average performance after controlling for dribbling skill.

This variation reflects different playing styles, strategies, or even team dynamics that affect their performance independently of individual dribbling skills.

The random effects show how much each team’s performance deviates from the league average while accounting for individual player skills (skill_dribbling), emphasizing the importance of both team-level differences and individual skill in predicting overall performance.