Sungji Peter Shin
4.07.2019
Available on Kaggle website, a FIFA19 dataset includes lastest edition FIFA2019 players attributes such as age, nationality, preferred foot, etc. FIFA 19 is a soccer simulation video game developed by EA Vancouver as part of Electronic Arts’ FIFA series.
library(nlme)
library(dplyr)
library(data.table)
library(magrittr)
library(haven)
library(lmerTest)
library(ggplot2)
library(texreg)
library(stringr)
library(sjmisc)
My dependent variable (‘overall_rating’) is the overall rating of each player. My independent variable (‘prefoot’) is recoded as that Left-foot players are assigned a value of 0 while Right-foot players are assigned a value of 1.
#importing fifa19 dataset
fifa19 <- read.csv("C:/Users/jw/Desktop/fifa19.csv", header = TRUE)
#selecting variables of intersest and renaming them
fifa19 <- fifa19 %>%
select(ID, Name, Age, Nationality, Overall, Preferred.Foot, Position, Value, Weak.Foot) %>%
rename("id"=ID, "name"=Name, "age"=Age, "country"=Nationality, "overall_rating"=Overall, "pref_foot"=Preferred.Foot, "position"=Position, "market_value"=Value, "weak_foot"=Weak.Foot)
#recoding variables
fifa19$market_value <- str_replace_all(fifa19$market_value, "€", "")
fifa19$market_value <- str_replace_all(fifa19$market_value, "M", "")
fifa19 <- fifa19 %>%
mutate(prefoot = as.double(pref_foot),
market_value = as.double(market_value))
fifa19 <- fifa19 %>%
mutate(prefoot = sjmisc::rec(prefoot, rec = "2=0; 3=1")) %>%
select(id, name, age, country, overall_rating, pref_foot, prefoot, everything())
head(fifa19)
The result shows that the overall rating of a player decreases if the player is right-pawed without statistical significance. Since country-level association may or may not reflect individual-level causal connection, individual-level analyses (complete-, no-pooling modeling) are conducted in the following.
#ecological analysis
countryd <- fifa19 %>%
group_by(country) %>%
summarise(mean_r = mean(overall_rating, na.rm = TRUE), mean_f = mean(prefoot, na.rm = TRUE))
ecoreg <- lm(mean_r ~ mean_f, data = countryd)
summary(ecoreg)
##
## Call:
## lm(formula = mean_r ~ mean_f, data = countryd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.8049 -2.6359 0.3569 2.5245 10.1951
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 66.8049 0.9467 70.565 <2e-16 ***
## mean_f -0.6135 1.1983 -0.512 0.609
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.423 on 162 degrees of freedom
## Multiple R-squared: 0.001615, Adjusted R-squared: -0.004548
## F-statistic: 0.2621 on 1 and 162 DF, p-value: 0.6094
The result shows that there is significant negative relationship between two variables at the most significant level (0). The overall rating of a player decreases by 0.7181 if the player is right-pawed. However, a lot can go wrong in this analysis: nationality may be an important determinant of a player’s overall rating; there may be important between-country variations in the relationship, and; the overall relationship is easily determined by a few countries.
#complete-pooling model
cpooling <- lm(overall_rating ~ prefoot, data = fifa19)
summary(cpooling)
##
## Call:
## lm(formula = overall_rating ~ prefoot, data = fifa19)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.0834 -4.0834 -0.0834 4.9166 27.9166
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 66.8015 0.1065 627.504 < 2e-16 ***
## prefoot -0.7181 0.1215 -5.912 3.44e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.908 on 18157 degrees of freedom
## (48 observations deleted due to missingness)
## Multiple R-squared: 0.001921, Adjusted R-squared: 0.001866
## F-statistic: 34.95 on 1 and 18157 DF, p-value: 3.444e-09
This model conducts 164 regression models, one for each individual country. Counts of the intercept of each regression model are somewhat normally distributed. Each intercept represents the average overall rating for left-pawed players in each country (prefoot=0). The mode overall rating for left-pawed players is about 68.
#no-pooling model: intercept
dcoef <- fifa19 %>%
group_by(country) %>%
do(mod = lm(overall_rating ~ prefoot, data = .))
coef <- dcoef %>% do(data.frame(intc = coef(.$mod)[1]))
ggplot(coef, aes(x = intc)) + geom_histogram() + xlim(50, 90)
Below are plots of the slope from each regression model, or the difference in overall rating between left- and right-pawed players across countries. Mode is about -2 and that means right-pawed players have lower overall rating by 2 than left-pawed players in about 26 countries (mode). But there is also a great deal of between-country variation in the slope parameters. For instance, in some country, the overall rating for right-pawed players are about 16 points higher than that of left-pawed players.
#no-pooling: slope
dcoef <- fifa19 %>%
group_by(country) %>%
do(mod = lm(overall_rating ~ prefoot, data = .))
coef <- dcoef %>% do(data.frame(footc = coef(.$mod)[2]))
ggplot(coef, aes(x = footc)) + geom_histogram()
Partial-pooling model allows between-country variations and superimposes a structure on the between-country variations. Shown in the result of partial-pooling model, the standard deviation between countries for left-pawed players is 2.60397. On average, overall rating for left-pawed players is 66.98103 and overall rating for right-pawed players is 0.43099 lower than their counterparts.
#random intercept
m1_lme <- lme(overall_rating ~ prefoot, data = fifa19, random = ~1|country, method = "ML", na.action = na.exclude)
summary(m1_lme)
## Linear mixed-effects model fit by maximum likelihood
## Data: fifa19
## AIC BIC logLik
## 118485.8 118517 -59238.88
##
## Random effects:
## Formula: ~1 | country
## (Intercept) Residual
## StdDev: 2.60397 6.268209
##
## Fixed effects: overall_rating ~ prefoot
## Value Std.Error DF t-value p-value
## (Intercept) 66.98103 0.2684528 17994 249.50761 0e+00
## prefoot -0.43099 0.1108393 17994 -3.88838 1e-04
## Correlation:
## (Intr)
## prefoot -0.317
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.687480127 -0.656311100 -0.001098119 0.624545352 4.270527371
##
## Number of Observations: 18159
## Number of Groups: 164
Random slope model allows the difference of main foot in overall rating to differ between countries. Overall rating of left-pawed players across countries is 66.96595 and standard deviation is 2.3531158. Overall rating of right-pawed players across countries is 0.40210 lower than their counterparts and standard deviation is 0.3475926. The intercept and slope has a negative correlation of 0.086, meaning that in countries wehre left-pawed players have high overall ratings, the difference in overall ratings between left- and right-pawed is low.
#multi-level model#2
m2_lme <- lme(overall_rating ~ prefoot, data = fifa19, random = ~ prefoot|country, method = "ML", na.action = na.exclude)
summary(m2_lme)
## Linear mixed-effects model fit by maximum likelihood
## Data: fifa19
## AIC BIC logLik
## 118479 118525.8 -59233.49
##
## Random effects:
## Formula: ~prefoot | country
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 2.3531158 (Intr)
## prefoot 0.3475926 0.945
## Residual 6.2661029
##
## Fixed effects: overall_rating ~ prefoot
## Value Std.Error DF t-value p-value
## (Intercept) 66.96595 0.2463663 17994 271.81456 0e+00
## prefoot -0.40210 0.1168613 17994 -3.44082 6e-04
## Correlation:
## (Intr)
## prefoot -0.086
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.713995192 -0.655840758 -0.002010866 0.621162529 4.280307564
##
## Number of Observations: 18159
## Number of Groups: 164
Compared between AIC values, random slope model fits the data the best.
#model selection
AIC(cpooling, m1_lme, m2_lme)
Intra-class correlation (ICC) can be obtained by
2.605741/(2.605741 + 6.266393) = 0.293699
About 29.4% of the total variation in overall rating can be attributed to country-level; while about 70.6% of the total variation in overall rating can be attributed to individual-level.
#intra-class correlation
m0_lme <- lme(overall_rating ~ 1, random = ~ 1|country, data = fifa19, method = "ML")
summary(m0_lme)
## Linear mixed-effects model fit by maximum likelihood
## Data: fifa19
## AIC BIC logLik
## 118786.2 118809.6 -59390.08
##
## Random effects:
## Formula: ~1 | country
## (Intercept) Residual
## StdDev: 2.605741 6.266393
##
## Fixed effects: overall_rating ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 66.63778 0.254555 18043 261.7815 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.705396680 -0.652601649 -0.003870449 0.624049867 4.263379627
##
## Number of Observations: 18207
## Number of Groups: 164
#computing confidence intervals
intervals(m0_lme)
## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) 66.13884 66.63778 67.13672
## attr(,"label")
## [1] "Fixed effects:"
##
## Random Effects:
## Level: country
## lower est. upper
## sd((Intercept)) 2.262123 2.605741 3.001555
##
## Within-group standard error:
## lower est. upper
## 6.202158 6.266393 6.331293
By running ecological, complete-, no-, & partial-pooling analysis, it is shown that there is a statistically significant difference in overall rating between left- and right-pawed players in FIFA19. Those who are dextropedal have significantly lower ratings than their counterparts. The difference of overall rating is more attributed to individual-level than to country-level. For the future analysis, it would be interesting to compare these results to those generated from real-human players data.