Email: shravan.bhatt@gmail.com
Company: BNP Paribas India Solutions
In recent times football has risen to become a truly global sport. There are very few other sports that are played in all or, if not, majority of the countries in the world. No matter where you go on the earth, you will find people of all ages, gender, caste, religon, etc. forgetting all their differences and either playing themselves or gathering together to watch some of the most talented and athletic people showcase their skills in one of the most exciting games, making it a universal element in a world torn apart by borders and boundaries.
Players of this game have to possess a number of skills ranging from general ones such as strength and stamina to more specific ones such as passing, dribbling, tackling, goal keeping and shooting depending on their position. The “position” of the player can be explained by the role played by them during the game, for their team. The “overall rating” of a player is the result of a formula that weights each attribute for each particular position[1]. The better a player is in the game, the higher will be their rating which indicates that the player is better at the individual skills and attributes mentioned. The overall rating literally has no meaning on it’s own in gameplay, it is entirely driven by a player’s specific attribute ratings[1].
FIFA: The International Federation of Association Football, conducts thousands of football tournaments and matches which comprise of millions of players around the world. Every year before the start of the season, football team coaches and managers want to find the best fit players who can play well in specific roles for their teams in order to win matches and eventually, the tourament. For this purpose, this paper assess how each specific attribute affects the overall rating of a player.
Our study consists of the top players from the world along with the some of their main attributes. The players come from different countries and continents. FIFA has 211 member nations spread across 6 confederations: Asia, Africa, Europe, North America, South America and Ocenaia[2]. Players can play for teams representing their own country or any club from around the world. Each team comprises of 11 players where there is 1 goal keeper who stands near the goal and does all he can to prevent the ball from entering the it and can use his hands and legs. The remaining 10 players are running and on-field. These players are divided into specific positions/roles for a specific purpose of the game: Forwards(FWD) are the frontline of the team, they play the attcking game and try scoring goals; Defenders(DEF) as the name suggests, play a more defensive role. Their job is to prevent the opponent from scoring; The MidFielders(MID) play a cetral, balanced and assistive role. They act as the bridge between the forwards and defenders. Apart from these players who are in the game, there some more stand-by players as well. Incase a player is injured or isn’t performing well during the game, and cannot continue playing in the middle of the match, he can be replaced from a pool of usually 5 or more back-up players called the Substitutes(Sub). The substitutes and the players make up the line-up of the team for a particular match. Apart from the players in the line-up there can be some more players in the team who are not allowed to play in the current match but can possibly be a part of future line-ups for future matches in the tournaments, these players are called as Reserves(Res). The subs and reserves are players who are less experienced or not fully fit to play the entire duration of the match. Players playing in each of these positions possess different a set of skills, for example: forwards focus on their attacking position, speed and finishing; midfielders focus on their dribbling, short passes and vision; defenders focus on their long passes and tackles and goal-keepers focus on their jumping, diving and reflexes. Our aim is to study how these different attributes contribute to the overall rating.
The specific objective of this study was to investigate how the different attributes of football players, playing at different positions, affects their overall rating. We compare multiple players across different positions, with different abilities and skills from different parts of the world such as Europe, Asia, Africa, Americas and Oceania. We compare attributes such as age, attacking position, speed, passing, tackling and goal-keeping to see how they are weighting the overall rating of a player. We believe the rating of players per position is considerably different based on the combination of these attributes. We expects the forwards to have the highest overall, closely followed by the goal keepers and midfielders, then defenders and lastly the substitutes and the reserved players. Based on this we coin the hypothesis:
H0: The mean rating of players at all different positions is the same
H1: The mean rating of players at all differnet positions is not the same.
For this study, we use the FIFA ’17 database of players which has information on approximately 18,000 players and 55 different player attributes. For the ease of this analysis, we will only work with a few of these which are:
Rating: Overall rating of the player (limits: 0-100)
Age: Age of the player (limits: 0-100)
Club_Position: Position at which the player plays. (FWD, MID, DEF, GK, Sub, Res)
Attacking_Position: Ability of the player to get into a good attacking position (limits: 0-100)
Speed: Running speed of the player (limits: 0-100)
Finishing: The skill of scoring by putting the ball inside the goal. (limits: 0-100)
Ball_Control: Ability of the player to keep the ball in his possession for extended periods of time. (limits: 0-100)
Short_Pass: Ability of the player to pass the ball accurately at a short distance.(limits: 0-100)
Vision: Ability of the player to see potential passes. (limits: 0-100)
Standing_Tackle: Ability of the player to snatch the ball out of an opponents feet while standing. (limits: 0-100)
Long_Pass: Ability of the player to pass the ball accurately to players at long distances. (limits: 0-100)
Jumping: Ability of the player to jump. (limits: 0-100)
GK_Positioning: Ability of the player to position himself properly in front of the goal while playing as a goal keeper.(limits: 0-100)
GK_Diving: Ability of the player to dive while playing as a goal keeper. (limits: 0-100)
Work_Rate: The rate at which the player runs with the ball in his possession/The rate at which the player runs without the ball in his position. (High/High, High/Low, High/Medium, Medium/Medium, Medium/High, Medium/Low, Low/Low, Low/Medium, Low/High)
Weak_foot: The ability of the player to play with his weaker foot.(limits: 1-5)
This data has been collected by a team of 9,000 people working at EA Sports called ‘Data Reviewers’ who consist of some professional-level scouts and coaches and consists heavily of season-ticket holders for the live matches– those who can watch many, many matches, and in person, and subjectively collect data on each player’s gameplay. The dataset was freely available on Kaggle.com[3].
To test Hypothesis H1, we propose the following model:
\[ Rating = \beta_0 + \beta_1Vision + \beta_2Short_Pass + \beta_3Long_Pass + \beta_4Jumping + \beta_5Ball_Control + \beta_6Standing_Tackle + \beta_7Dribbling + \beta_8GK_Positioning + \beta_9GK_Diving + \epsilon\ \]
#Linear Regression
fit <- lm(Rating ~ Vision+Short_Pass+Long_Pass+Jumping+Ball_Control+Standing_Tackle+Dribbling+GK_Positioning+GK_Diving,data=fifa)
summary(fit)
##
## Call:
## lm(formula = Rating ~ Vision + Short_Pass + Long_Pass + Jumping +
## Ball_Control + Standing_Tackle + Dribbling + GK_Positioning +
## GK_Diving, data = fifa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.1546 -2.7746 -0.1921 2.5696 22.8216
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.549258 0.294468 59.597 < 2e-16 ***
## Vision 0.028104 0.003911 7.185 6.97e-13 ***
## Short_Pass 0.159494 0.007084 22.514 < 2e-16 ***
## Long_Pass -0.062970 0.005192 -12.128 < 2e-16 ***
## Jumping 0.112310 0.002884 38.936 < 2e-16 ***
## Ball_Control 0.412100 0.006971 59.115 < 2e-16 ***
## Standing_Tackle 0.094434 0.002221 42.527 < 2e-16 ***
## Dribbling -0.030610 0.005046 -6.067 1.33e-09 ***
## GK_Positioning 0.242423 0.007479 32.413 < 2e-16 ***
## GK_Diving 0.187520 0.007261 25.826 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.171 on 17578 degrees of freedom
## Multiple R-squared: 0.6533, Adjusted R-squared: 0.6532
## F-statistic: 3681 on 9 and 17578 DF, p-value: < 2.2e-16
From the above model, we can see how the different chosen attributes are affecting the overall rating of the player. This model has an F-statistic=3681 and a Multiple R-squared value=0.6533 which means it accounts for 65.33% of variances and Adjusted R-squared=0.6532 with a p-value < 2.2e-16 < 0.01. We get the equation:
\[ Rating = 17.549258 + 0.028104Vision + 0.159494Short_Pass - 0.062970Long_Pass + 0.112310Jumping + 0.412100Ball_Control + 0.094434Standing_Tackle - 0.030610Dribbling + 0.242423GK_Positioning + 0.187520GK_Diving \]
Thus we can reject the null hypothesis that the average rating of players per position remains the same and see from the above equation how the various attributes affect the overall rating of the player.
## Position x
## 1 GK 69.82595
## 2 Res 61.06959
## 3 Sub 65.37173
## 4 FWD 70.36251
## 5 MID 69.34502
## 6 DEF 68.73204
Hence, we can see from this analysis how the various attributes of a football player such as Age, Club_Position, Attacking_Position, Speed, Finishing, Ball_Control, Short_Pass, Vision, Standing_Tackle, Long_Pass, Dribbling, Jumping, GK_Positioning, GK_Diving, Work_Rate, Weak foot affect their overall rating.
## vars n mean sd median trimmed mad min max
## Rating 1 17588 66.17 7.08 66 66.21 7.41 45 94
## Age 2 17588 25.46 4.68 25 25.23 5.93 17 47
## Club_Position* 3 17588 3.55 1.41 3 3.48 1.48 1 6
## Attacking_Position 4 17588 49.59 19.41 54 51.11 17.79 2 94
## Speed 5 17588 65.48 14.10 68 66.70 11.86 11 96
## Finishing 6 17588 45.16 19.37 48 45.78 22.24 2 95
## Ball_Control 7 17588 57.97 16.83 63 60.32 11.86 5 95
## Short_Pass 8 17588 58.12 14.98 62 59.88 10.38 10 92
## Vision 9 17588 52.71 14.59 54 53.27 14.83 10 94
## Standing_Tackle 10 17588 47.44 21.83 54 48.14 25.20 3 92
## Long_Pass 11 17588 52.40 15.62 56 53.46 14.83 7 93
## Dribbling 12 17588 54.80 18.91 60 57.08 13.34 4 97
## Jumping 13 17588 64.92 11.43 65 65.33 10.38 15 95
## GK_Positioning 14 17588 16.61 17.14 11 11.92 4.45 1 91
## GK_Diving 15 17588 16.82 17.80 11 11.94 4.45 1 89
## Work_Rate* 16 17588 6.94 2.79 9 7.34 0.00 1 9
## Weak_foot* 17 17588 2.93 0.66 3 2.91 0.00 1 5
## range skew kurtosis se
## Rating 49 -0.02 -0.03 0.05
## Age 30 0.41 -0.43 0.04
## Club_Position* 5 0.45 -0.86 0.01
## Attacking_Position 92 -0.64 -0.52 0.15
## Speed 85 -0.84 0.69 0.11
## Finishing 93 -0.25 -1.03 0.15
## Ball_Control 90 -1.18 0.76 0.13
## Short_Pass 82 -1.01 0.49 0.11
## Vision 84 -0.36 -0.36 0.11
## Standing_Tackle 89 -0.29 -1.34 0.16
## Long_Pass 86 -0.57 -0.46 0.12
## Dribbling 93 -1.00 0.14 0.14
## Jumping 80 -0.39 0.32 0.09
## GK_Positioning 90 2.43 4.49 0.13
## GK_Diving 88 2.41 4.30 0.13
## Work_Rate* 8 -0.91 -0.85 0.02
## Weak_foot* 4 0.13 0.60 0.00
## Work_Rate x
## 1 High / High 70.61580
## 2 High / Low 67.23425
## 3 High / Medium 68.22790
## 4 Low / High 66.96119
## 5 Low / Low 67.43333
## 6 Low / Medium 65.92650
## 7 Medium / High 68.20535
## 8 Medium / Low 67.34911
## 9 Medium / Medium 64.69849
## Weak_foot_rating x
## 1 1 62.65753
## 2 2 64.47823
## 3 3 65.86477
## 4 4 69.85309
## 5 5 70.94416
##
## GK Res Sub FWD MID DEF
## High / High 0 57 272 80 228 110
## High / Low 0 107 349 142 119 13
## High / Medium 0 405 1181 356 484 492
## Low / High 0 74 167 0 45 152
## Low / Low 0 8 12 5 5 0
## Low / Medium 0 83 194 2 34 136
## Medium / High 0 179 592 23 361 379
## Medium / Low 0 134 375 147 178 11
## Medium / Medium 632 2100 4350 376 1198 1241
##
## 1 2 3 4 5
## GK 27 253 320 30 2
## Res 35 784 2048 259 21
## Sub 67 1649 4670 1036 70
## FWD 1 125 653 325 27
## MID 4 311 1651 632 54
## DEF 12 644 1632 223 23
## Weak_foot 1 2 3 4 5
## Club_Position Work_Rate
## GK High / High 0 0 0 0 0
## High / Low 0 0 0 0 0
## High / Medium 0 0 0 0 0
## Low / High 0 0 0 0 0
## Low / Low 0 0 0 0 0
## Low / Medium 0 0 0 0 0
## Medium / High 0 0 0 0 0
## Medium / Low 0 0 0 0 0
## Medium / Medium 27 253 320 30 2
## Res High / High 0 10 41 6 0
## High / Low 1 23 66 14 3
## High / Medium 1 71 254 72 7
## Low / High 0 23 44 6 1
## Low / Low 0 3 3 2 0
## Low / Medium 1 35 47 0 0
## Medium / High 3 46 117 10 3
## Medium / Low 0 31 85 16 2
## Medium / Medium 29 542 1391 133 5
## Sub High / High 0 31 172 63 6
## High / Low 2 57 200 84 6
## High / Medium 2 196 708 263 12
## Low / High 0 68 88 11 0
## Low / Low 0 3 7 2 0
## Low / Medium 1 75 109 9 0
## Medium / High 0 128 401 61 2
## Medium / Low 1 67 219 81 7
## Medium / Medium 61 1024 2766 462 37
## FWD High / High 1 11 35 29 4
## High / Low 0 17 72 42 11
## High / Medium 0 32 202 118 4
## Low / High 0 0 0 0 0
## Low / Low 0 0 4 1 0
## Low / Medium 0 1 0 1 0
## Medium / High 0 3 14 6 0
## Medium / Low 0 19 90 36 2
## Medium / Medium 0 42 236 92 6
## MID High / High 0 18 120 81 9
## High / Low 0 16 62 36 5
## High / Medium 0 52 262 162 8
## Low / High 0 9 32 4 0
## Low / Low 0 0 2 3 0
## Low / Medium 1 7 22 3 1
## Medium / High 0 37 261 60 3
## Medium / Low 0 24 83 66 5
## Medium / Medium 3 148 807 217 23
## DEF High / High 0 22 69 17 2
## High / Low 0 7 5 1 0
## High / Medium 2 108 307 65 10
## Low / High 1 47 96 7 1
## Low / Low 0 0 0 0 0
## Low / Medium 1 51 77 6 1
## Medium / High 0 81 252 42 4
## Medium / Low 1 4 6 0 0
## Medium / Medium 7 324 820 85 5
chisq.test(table(fifa$Work_Rate,fifa$Club_Position))
## Warning in chisq.test(table(fifa$Work_Rate, fifa$Club_Position)): Chi-
## squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: table(fifa$Work_Rate, fifa$Club_Position)
## X-squared = 2272.8, df = 40, p-value < 2.2e-16
From this test we can say that p-value is extremely small < 0.01 hence we reject the null hypothesis that work rate and player position are independant
chisq.test(table(fifa$Club_Position,fifa$Weak_foot))
##
## Pearson's Chi-squared test
##
## data: table(fifa$Club_Position, fifa$Weak_foot)
## X-squared = 1016.6, df = 20, p-value < 2.2e-16
From this test we can say that p-value is extremely small < 0.01 hence we reject the null hypothesis that weaker foot performance and player position are independant
fit2 <- aov(Rating ~ Club_Position*Work_Rate, data=fifa)
summary(fit2)
## Df Sum Sq Mean Sq F value Pr(>F)
## Club_Position 5 158335 31667 810.123 < 2e-16 ***
## Work_Rate 8 35890 4486 114.771 < 2e-16 ***
## Club_Position:Work_Rate 30 2318 77 1.977 0.00114 **
## Residuals 17544 685780 39
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
fit1 <- lm(Rating ~ Age+Short_Pass+Vision+Long_Pass+Ball_Control+Finishing+Attacking_Position,data=fifa)
summary(fit1)
##
## Call:
## lm(formula = Rating ~ Age + Short_Pass + Vision + Long_Pass +
## Ball_Control + Finishing + Attacking_Position, data = fifa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.0366 -3.5878 -0.2602 3.2546 28.0084
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.088991 0.265615 139.634 < 2e-16 ***
## Age 0.579878 0.008840 65.595 < 2e-16 ***
## Short_Pass 0.112184 0.008868 12.650 < 2e-16 ***
## Vision 0.126840 0.004751 26.696 < 2e-16 ***
## Long_Pass -0.024811 0.006444 -3.850 0.000118 ***
## Ball_Control 0.123684 0.007370 16.781 < 2e-16 ***
## Finishing -0.005758 0.004603 -1.251 0.210980
## Attacking_Position -0.090792 0.005408 -16.789 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.313 on 17580 degrees of freedom
## Multiple R-squared: 0.4376, Adjusted R-squared: 0.4373
## F-statistic: 1954 on 7 and 17580 DF, p-value: < 2.2e-16
This is a low quality model, accounting for only 43.76% of variances