Why use multi level modelling?

With this method, data can be clustered into different groups; with this example being EPL data with different teams. The idea is that a random effect is taken into account such as different teams taking into account the effect of different coaches’ tactics and playing styles.

EDA

overall Fifa rating score

To start we can assess our response variable being overall Fifa rating score. We can gain insight into the distribution of overall ratings

Skill Dribbling

Looking at the data of skill dribbling which we aim to use as our predictor variable

Correlation of overall and skill_dribbling

After filtering out some data we have this graph showing the correlation between skill ball control and overall rating

For the sake of this example we are only going to compare variables within the same league which will be the EPL

Now we can have a look at a facet wrapped linear model of each of the teams in the English Premier League. The relationship between skill_dribbling and overall fifa rating

Different models

Simple linear model

## 
## Call:
## lm(formula = overall ~ skill_dribbling, data = epl)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.8975  -3.9597  -0.2916   3.6592  20.6764 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     36.46882    1.74192   20.94   <2e-16 ***
## skill_dribbling  0.52463    0.02482   21.14   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.443 on 564 degrees of freedom
## Multiple R-squared:  0.4421, Adjusted R-squared:  0.4411 
## F-statistic: 446.9 on 1 and 564 DF,  p-value: < 2.2e-16

overall rating = 36.47 + 0.52(dribbling skill)

For every 1 unit increase in dribbling skill, overall rating increases by 0.52 points

Random intercepts

The purpose of the unconditional means model is to assess the amount of variation at each level—to compare variability between subjects. Our random intercept as club

Here we have a plot showing the intercepts of each team for overall rating - Fixed effect (Intercept) = 72.86 (initial baseline rating value)

Random effect (club) = 7.09 (clubs will vary around the intercept 72.86, on average by 7.09 units)
Chelsea has the largest intercept, 72.86 + 4.16 = 77.02
Brentford has the smallest intercept, 72.86 - 2.95 = 69.91

Adding ‘skill_dribbling’

- When dribbling skill is added to the model:

Man U has the largest intercepts (+1.26)
Wolverhampton has the smallest intercept (-1.38)

Random slopes and Intercepts model

For another model we can add dribbling skill as a random slope. By adding this random slope we can assess the effect of how a predictor variable may also vary across groups. By adding a random slope, you allow for these differences in the relationship between the predictor and the outcome across groups, which leads to a better model fit and more accurate estimation of effects.

Model comparison

Assessing R2

	overall			overall			overall			overall			overall
Predictors	Estimates	CI	p	Estimates	CI	p	Estimates	CI	p	Estimates	CI	p	Estimates	CI	p
(Intercept)	36.47	33.05 – 39.89	<0.001	72.86	71.51 – 74.21	<0.001	37.48	34.00 – 40.95	<0.001	37.50	33.11 – 41.88	<0.001	60.49	60.12 – 60.86	<0.001
skill dribbling	0.52	0.48 – 0.57	<0.001				0.51	0.46 – 0.56	<0.001	0.51	0.45 – 0.57	<0.001	0.09	0.09 – 0.10	<0.001
Random Effects
σ²				67.55			40.31			39.45			25.33
τ₀₀				7.09 _{club_name}			1.30 _{club_name}			38.39 _{club_name}			12.07 _{league_name:club_name}
													3.66 _{club_name}
τ₁₁										0.01 _{club_name.skill_dribbling}
ρ₀₁										-0.98 _{club_name}
ICC				0.10			0.03			0.05			0.38
N				20 _{club_name}			20 _{club_name}			20 _{club_name}			56 _{league_name}
													702 _{club_name}
Observations	566			566			566			566			19239
R² / R² adjusted	0.442 / 0.441			0.000 / 0.095			0.427 / 0.445			0.425 / 0.456			0.069 / 0.426

Assessing AIC

Name	Model	AIC	AIC_wt	AICc	AICc_wt	BIC	BIC_wt	RMSE	Sigma	R2_conditional	R2_marginal	ICC	R2	R2_adjusted
LM	lm	3719.230	0.2170648	3719.272	0.2217871	3732.245	0.7851841	6.432091	6.443485	NA	NA	NA	0.4420716	0.4410824
RI	lmerMod	4023.303	0.0000000	4023.345	0.0000000	4036.318	0.0000000	8.107824	8.218994	0.0950210	0.0000000	0.0950210	NA	NA
RI_SD	lmerMod	3717.497	0.5162730	3717.568	0.5200159	3734.851	0.2133771	6.286890	6.349113	0.4451975	0.4272943	0.0312607	NA	NA
RS_IM	lmerMod	3718.818	0.2666621	3718.968	0.2581969	3744.850	0.0014388	6.181165	6.280851	0.4555591	0.4254237	0.0524479	NA	NA
Without_RS	lmerMod	118810.646	0.0000000	118810.649	0.0000000	118849.969	0.0000000	4.945197	5.032757	0.4255178	0.0689104	0.3830001	NA	NA

RESULTS

R2 BEST MODEL: The LM had the best R2 with 0.442 (shows highest level of correlation) Linear models tend to have a better R² because they try to approximate the linear component of the relationship, but this does not mean they capture the true complexity of the data. Complex models might have lower R² simply because they do not fully rely on a linear approximation. Additionally, R² can be misleading when comparing non-nested models that differ in their form.

AIC BEST MODEL: RI_SD (Random intercepts with skill dribbling) showed to be the lowest AIC model. This refers to the estimated prediction error. So the lowest score is the best and therefore the best model for use

Conclusion:

Teams like Manchester United and Manchester City are significantly better compared to the average performance after controlling for dribbling skill.

This variation reflects different playing styles, strategies, or even team dynamics that affect their performance independently of individual dribbling skills.

The random effects show how much each team’s performance deviates from the league average while accounting for individual player skills (skill_dribbling), emphasizing the importance of both team-level differences and individual skill in predicting overall performance.

Multi Level Modelling

Nic Krotiris

2024-10-29