Sungji Peter Shin
4.07.2019

Who Gets Higher Rating? Dextropedal (Right-Foot Player) or Sinistropedal (Left-Foot Player)?

An analysis of FIFA19 player dataset (Multilevel Analysis)

Available on Kaggle website, a FIFA19 dataset includes lastest edition FIFA2019 players attributes such as age, nationality, preferred foot, etc. FIFA 19 is a soccer simulation video game developed by EA Vancouver as part of Electronic Arts’ FIFA series.

library(nlme)
library(dplyr)
library(data.table)
library(magrittr)
library(haven)
library(lmerTest)
library(ggplot2)
library(texreg)
library(stringr)
library(sjmisc)

Importing and Data Manipulation:

My dependent variable (‘overall_rating’) is the overall rating of each player. My independent variable (‘prefoot’) is recoded as that Left-foot players are assigned a value of 0 while Right-foot players are assigned a value of 1.

#importing fifa19 dataset
fifa19 <- read.csv("C:/Users/jw/Desktop/fifa19.csv", header = TRUE)

#selecting variables of intersest and renaming them
fifa19 <- fifa19 %>% 
  select(ID, Name, Age, Nationality, Overall, Preferred.Foot, Position, Value, Weak.Foot) %>% 
  rename("id"=ID, "name"=Name, "age"=Age, "country"=Nationality, "overall_rating"=Overall, "pref_foot"=Preferred.Foot, "position"=Position, "market_value"=Value, "weak_foot"=Weak.Foot)

#recoding variables
fifa19$market_value <- str_replace_all(fifa19$market_value, "€", "")
fifa19$market_value <- str_replace_all(fifa19$market_value, "M", "")

fifa19 <- fifa19 %>% 
  mutate(prefoot = as.double(pref_foot),
         market_value = as.double(market_value))
fifa19 <- fifa19 %>% 
  mutate(prefoot = sjmisc::rec(prefoot, rec = "2=0; 3=1")) %>% 
  select(id, name, age, country, overall_rating, pref_foot, prefoot, everything())

head(fifa19)

Ecological Analysis (Country-level Analysis):

The result shows that the overall rating of a player decreases if the player is right-pawed without statistical significance. Since country-level association may or may not reflect individual-level causal connection, individual-level analyses (complete-, no-pooling modeling) are conducted in the following.

#ecological analysis
countryd <- fifa19 %>% 
  group_by(country) %>% 
  summarise(mean_r = mean(overall_rating, na.rm = TRUE), mean_f = mean(prefoot, na.rm = TRUE))

ecoreg <- lm(mean_r ~ mean_f, data = countryd)
summary(ecoreg)
## 
## Call:
## lm(formula = mean_r ~ mean_f, data = countryd)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.8049  -2.6359   0.3569   2.5245  10.1951 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  66.8049     0.9467  70.565   <2e-16 ***
## mean_f       -0.6135     1.1983  -0.512    0.609    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.423 on 162 degrees of freedom
## Multiple R-squared:  0.001615,   Adjusted R-squared:  -0.004548 
## F-statistic: 0.2621 on 1 and 162 DF,  p-value: 0.6094

Complete-Pooling Model (Individual-level Analysis):

The result shows that there is significant negative relationship between two variables at the most significant level (0). The overall rating of a player decreases by 0.7181 if the player is right-pawed. However, a lot can go wrong in this analysis: nationality may be an important determinant of a player’s overall rating; there may be important between-country variations in the relationship, and; the overall relationship is easily determined by a few countries.

#complete-pooling model
cpooling <- lm(overall_rating ~ prefoot, data = fifa19)
summary(cpooling)
## 
## Call:
## lm(formula = overall_rating ~ prefoot, data = fifa19)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -20.0834  -4.0834  -0.0834   4.9166  27.9166 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  66.8015     0.1065 627.504  < 2e-16 ***
## prefoot      -0.7181     0.1215  -5.912 3.44e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.908 on 18157 degrees of freedom
##   (48 observations deleted due to missingness)
## Multiple R-squared:  0.001921,   Adjusted R-squared:  0.001866 
## F-statistic: 34.95 on 1 and 18157 DF,  p-value: 3.444e-09

No-Pooling Model (Intercept):

This model conducts 164 regression models, one for each individual country. Counts of the intercept of each regression model are somewhat normally distributed. Each intercept represents the average overall rating for left-pawed players in each country (prefoot=0). The mode overall rating for left-pawed players is about 68.

#no-pooling model: intercept
dcoef <- fifa19 %>% 
  group_by(country) %>% 
  do(mod = lm(overall_rating ~ prefoot, data = .))
coef <- dcoef %>% do(data.frame(intc = coef(.$mod)[1]))
ggplot(coef, aes(x = intc)) + geom_histogram() + xlim(50, 90)

No-Pooling Model (Slope):

Below are plots of the slope from each regression model, or the difference in overall rating between left- and right-pawed players across countries. Mode is about -2 and that means right-pawed players have lower overall rating by 2 than left-pawed players in about 26 countries (mode). But there is also a great deal of between-country variation in the slope parameters. For instance, in some country, the overall rating for right-pawed players are about 16 points higher than that of left-pawed players.

#no-pooling: slope
dcoef <- fifa19 %>% 
  group_by(country) %>% 
  do(mod = lm(overall_rating ~ prefoot, data = .))
coef <- dcoef %>% do(data.frame(footc = coef(.$mod)[2]))
ggplot(coef, aes(x = footc)) + geom_histogram()

Random Intercept Model:

Partial-pooling model allows between-country variations and superimposes a structure on the between-country variations. Shown in the result of partial-pooling model, the standard deviation between countries for left-pawed players is 2.60397. On average, overall rating for left-pawed players is 66.98103 and overall rating for right-pawed players is 0.43099 lower than their counterparts.

#random intercept
m1_lme <- lme(overall_rating ~ prefoot, data = fifa19, random = ~1|country, method = "ML", na.action = na.exclude)
summary(m1_lme)
## Linear mixed-effects model fit by maximum likelihood
##  Data: fifa19 
##        AIC    BIC    logLik
##   118485.8 118517 -59238.88
## 
## Random effects:
##  Formula: ~1 | country
##         (Intercept) Residual
## StdDev:     2.60397 6.268209
## 
## Fixed effects: overall_rating ~ prefoot 
##                Value Std.Error    DF   t-value p-value
## (Intercept) 66.98103 0.2684528 17994 249.50761   0e+00
## prefoot     -0.43099 0.1108393 17994  -3.88838   1e-04
##  Correlation: 
##         (Intr)
## prefoot -0.317
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -3.687480127 -0.656311100 -0.001098119  0.624545352  4.270527371 
## 
## Number of Observations: 18159
## Number of Groups: 164

Random Slope Model:

Random slope model allows the difference of main foot in overall rating to differ between countries. Overall rating of left-pawed players across countries is 66.96595 and standard deviation is 2.3531158. Overall rating of right-pawed players across countries is 0.40210 lower than their counterparts and standard deviation is 0.3475926. The intercept and slope has a negative correlation of 0.086, meaning that in countries wehre left-pawed players have high overall ratings, the difference in overall ratings between left- and right-pawed is low.

#multi-level model#2
m2_lme <- lme(overall_rating ~ prefoot, data = fifa19, random = ~ prefoot|country, method = "ML", na.action = na.exclude)
summary(m2_lme)
## Linear mixed-effects model fit by maximum likelihood
##  Data: fifa19 
##      AIC      BIC    logLik
##   118479 118525.8 -59233.49
## 
## Random effects:
##  Formula: ~prefoot | country
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 2.3531158 (Intr)
## prefoot     0.3475926 0.945 
## Residual    6.2661029       
## 
## Fixed effects: overall_rating ~ prefoot 
##                Value Std.Error    DF   t-value p-value
## (Intercept) 66.96595 0.2463663 17994 271.81456   0e+00
## prefoot     -0.40210 0.1168613 17994  -3.44082   6e-04
##  Correlation: 
##         (Intr)
## prefoot -0.086
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -3.713995192 -0.655840758 -0.002010866  0.621162529  4.280307564 
## 
## Number of Observations: 18159
## Number of Groups: 164

Model Selection:

Compared between AIC values, random slope model fits the data the best.

#model selection
AIC(cpooling, m1_lme, m2_lme)

Intra-Class Correlation:

Is overall rating mainly an individual-level or country-level?

Intra-class correlation (ICC) can be obtained by
2.605741/(2.605741 + 6.266393) = 0.293699
About 29.4% of the total variation in overall rating can be attributed to country-level; while about 70.6% of the total variation in overall rating can be attributed to individual-level.

#intra-class correlation
m0_lme <- lme(overall_rating ~ 1, random = ~ 1|country, data = fifa19, method = "ML")
summary(m0_lme)
## Linear mixed-effects model fit by maximum likelihood
##  Data: fifa19 
##        AIC      BIC    logLik
##   118786.2 118809.6 -59390.08
## 
## Random effects:
##  Formula: ~1 | country
##         (Intercept) Residual
## StdDev:    2.605741 6.266393
## 
## Fixed effects: overall_rating ~ 1 
##                Value Std.Error    DF  t-value p-value
## (Intercept) 66.63778  0.254555 18043 261.7815       0
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -3.705396680 -0.652601649 -0.003870449  0.624049867  4.263379627 
## 
## Number of Observations: 18207
## Number of Groups: 164
#computing confidence intervals
intervals(m0_lme)
## Approximate 95% confidence intervals
## 
##  Fixed effects:
##                lower     est.    upper
## (Intercept) 66.13884 66.63778 67.13672
## attr(,"label")
## [1] "Fixed effects:"
## 
##  Random Effects:
##   Level: country 
##                    lower     est.    upper
## sd((Intercept)) 2.262123 2.605741 3.001555
## 
##  Within-group standard error:
##    lower     est.    upper 
## 6.202158 6.266393 6.331293

Conclusion:

By running ecological, complete-, no-, & partial-pooling analysis, it is shown that there is a statistically significant difference in overall rating between left- and right-pawed players in FIFA19. Those who are dextropedal have significantly lower ratings than their counterparts. The difference of overall rating is more attributed to individual-level than to country-level. For the future analysis, it would be interesting to compare these results to those generated from real-human players data.