Email: shravan.bhatt@gmail.com

Company: BNP Paribas India Solutions

1. Introduction:

In recent times football has risen to become a truly global sport. There are very few other sports that are played in all or, if not, majority of the countries in the world. No matter where you go on the earth, you will find people of all ages, gender, caste, religon, etc. forgetting all their differences and either playing themselves or gathering together to watch some of the most talented and athletic people showcase their skills in one of the most exciting games, making it a universal element in a world torn apart by borders and boundaries.

Players of this game have to possess a number of skills ranging from general ones such as strength and stamina to more specific ones such as passing, dribbling, tackling, goal keeping and shooting depending on their position. The “position” of the player can be explained by the role played by them during the game, for their team. The “overall rating” of a player is the result of a formula that weights each attribute for each particular position[1]. The better a player is in the game, the higher will be their rating which indicates that the player is better at the individual skills and attributes mentioned. The overall rating literally has no meaning on it’s own in gameplay, it is entirely driven by a player’s specific attribute ratings[1].

FIFA: The International Federation of Association Football, conducts thousands of football tournaments and matches which comprise of millions of players around the world. Every year before the start of the season, football team coaches and managers want to find the best fit players who can play well in specific roles for their teams in order to win matches and eventually, the tourament. For this purpose, this paper assess how each specific attribute affects the overall rating of a player.

2. Overview of the Study:

Our study consists of the top players from the world along with the some of their main attributes. The players come from different countries and continents. FIFA has 211 member nations spread across 6 confederations: Asia, Africa, Europe, North America, South America and Ocenaia[2]. Players can play for teams representing their own country or any club from around the world. Each team comprises of 11 players where there is 1 goal keeper who stands near the goal and does all he can to prevent the ball from entering the it and can use his hands and legs. The remaining 10 players are running and on-field. These players are divided into specific positions/roles for a specific purpose of the game: Forwards(FWD) are the frontline of the team, they play the attcking game and try scoring goals; Defenders(DEF) as the name suggests, play a more defensive role. Their job is to prevent the opponent from scoring; The MidFielders(MID) play a cetral, balanced and assistive role. They act as the bridge between the forwards and defenders. Apart from these players who are in the game, there some more stand-by players as well. Incase a player is injured or isn’t performing well during the game, and cannot continue playing in the middle of the match, he can be replaced from a pool of usually 5 or more back-up players called the Substitutes(Sub). The substitutes and the players make up the line-up of the team for a particular match. Apart from the players in the line-up there can be some more players in the team who are not allowed to play in the current match but can possibly be a part of future line-ups for future matches in the tournaments, these players are called as Reserves(Res). The subs and reserves are players who are less experienced or not fully fit to play the entire duration of the match. Players playing in each of these positions possess different a set of skills, for example: forwards focus on their attacking position, speed and finishing; midfielders focus on their dribbling, short passes and vision; defenders focus on their long passes and tackles and goal-keepers focus on their jumping, diving and reflexes. Our aim is to study how these different attributes contribute to the overall rating.

3. An empirical study of the attriutes of football players:

3.1 Overview:

The specific objective of this study was to investigate how the different attributes of football players, playing at different positions, affects their overall rating. We compare multiple players across different positions, with different abilities and skills from different parts of the world such as Europe, Asia, Africa, Americas and Oceania. We compare attributes such as age, attacking position, speed, passing, tackling and goal-keeping to see how they are weighting the overall rating of a player. We believe the rating of players per position is considerably different based on the combination of these attributes. We expects the forwards to have the highest overall, closely followed by the goal keepers and midfielders, then defenders and lastly the substitutes and the reserved players. Based on this we coin the hypothesis:

H0: The mean rating of players at all different positions is the same

H1: The mean rating of players at all differnet positions is not the same.

3.2 Data:

For this study, we use the FIFA ’17 database of players which has information on approximately 18,000 players and 55 different player attributes. For the ease of this analysis, we will only work with a few of these which are:

  • Rating: Overall rating of the player (limits: 0-100)

  • Age: Age of the player (limits: 0-100)

  • Club_Position: Position at which the player plays. (FWD, MID, DEF, GK, Sub, Res)

  • Attacking_Position: Ability of the player to get into a good attacking position (limits: 0-100)

  • Speed: Running speed of the player (limits: 0-100)

  • Finishing: The skill of scoring by putting the ball inside the goal. (limits: 0-100)

  • Ball_Control: Ability of the player to keep the ball in his possession for extended periods of time. (limits: 0-100)

  • Short_Pass: Ability of the player to pass the ball accurately at a short distance.(limits: 0-100)

  • Vision: Ability of the player to see potential passes. (limits: 0-100)

  • Standing_Tackle: Ability of the player to snatch the ball out of an opponents feet while standing. (limits: 0-100)

  • Long_Pass: Ability of the player to pass the ball accurately to players at long distances. (limits: 0-100)

  • Dribbling: Ability of the player to run with the ball in his possession.(limits: 0-100)
  • Jumping: Ability of the player to jump. (limits: 0-100)

  • GK_Positioning: Ability of the player to position himself properly in front of the goal while playing as a goal keeper.(limits: 0-100)

  • GK_Diving: Ability of the player to dive while playing as a goal keeper. (limits: 0-100)

  • Work_Rate: The rate at which the player runs with the ball in his possession/The rate at which the player runs without the ball in his position. (High/High, High/Low, High/Medium, Medium/Medium, Medium/High, Medium/Low, Low/Low, Low/Medium, Low/High)

  • Weak_foot: The ability of the player to play with his weaker foot.(limits: 1-5)

This data has been collected by a team of 9,000 people working at EA Sports called ‘Data Reviewers’ who consist of some professional-level scouts and coaches and consists heavily of season-ticket holders for the live matches– those who can watch many, many matches, and in person, and subjectively collect data on each player’s gameplay. The dataset was freely available on Kaggle.com[3].

3.3 Analysis:

To test Hypothesis H1, we propose the following model:

\[ Rating = \beta_0 + \beta_1Vision + \beta_2Short_Pass + \beta_3Long_Pass + \beta_4Jumping + \beta_5Ball_Control + \beta_6Standing_Tackle + \beta_7Dribbling + \beta_8GK_Positioning + \beta_9GK_Diving + \epsilon\ \]

#Linear Regression
fit <- lm(Rating ~ Vision+Short_Pass+Long_Pass+Jumping+Ball_Control+Standing_Tackle+Dribbling+GK_Positioning+GK_Diving,data=fifa)

summary(fit)
## 
## Call:
## lm(formula = Rating ~ Vision + Short_Pass + Long_Pass + Jumping + 
##     Ball_Control + Standing_Tackle + Dribbling + GK_Positioning + 
##     GK_Diving, data = fifa)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -17.1546  -2.7746  -0.1921   2.5696  22.8216 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     17.549258   0.294468  59.597  < 2e-16 ***
## Vision           0.028104   0.003911   7.185 6.97e-13 ***
## Short_Pass       0.159494   0.007084  22.514  < 2e-16 ***
## Long_Pass       -0.062970   0.005192 -12.128  < 2e-16 ***
## Jumping          0.112310   0.002884  38.936  < 2e-16 ***
## Ball_Control     0.412100   0.006971  59.115  < 2e-16 ***
## Standing_Tackle  0.094434   0.002221  42.527  < 2e-16 ***
## Dribbling       -0.030610   0.005046  -6.067 1.33e-09 ***
## GK_Positioning   0.242423   0.007479  32.413  < 2e-16 ***
## GK_Diving        0.187520   0.007261  25.826  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.171 on 17578 degrees of freedom
## Multiple R-squared:  0.6533, Adjusted R-squared:  0.6532 
## F-statistic:  3681 on 9 and 17578 DF,  p-value: < 2.2e-16

From the above model, we can see how the different chosen attributes are affecting the overall rating of the player. This model has an F-statistic=3681 and a Multiple R-squared value=0.6533 which means it accounts for 65.33% of variances and Adjusted R-squared=0.6532 with a p-value < 2.2e-16 < 0.01. We get the equation:

\[ Rating = 17.549258 + 0.028104Vision + 0.159494Short_Pass - 0.062970Long_Pass + 0.112310Jumping + 0.412100Ball_Control + 0.094434Standing_Tackle - 0.030610Dribbling + 0.242423GK_Positioning + 0.187520GK_Diving \]

Result:

Thus we can reject the null hypothesis that the average rating of players per position remains the same and see from the above equation how the various attributes affect the overall rating of the player.

Average Rating per position:

##   Position        x
## 1       GK 69.82595
## 2      Res 61.06959
## 3      Sub 65.37173
## 4      FWD 70.36251
## 5      MID 69.34502
## 6      DEF 68.73204

Boxplot of Rating and Position

Conclusion

Hence, we can see from this analysis how the various attributes of a football player such as Age, Club_Position, Attacking_Position, Speed, Finishing, Ball_Control, Short_Pass, Vision, Standing_Tackle, Long_Pass, Dribbling, Jumping, GK_Positioning, GK_Diving, Work_Rate, Weak foot affect their overall rating.

Appendix 1

Description of data:

##                    vars     n  mean    sd median trimmed   mad min max
## Rating                1 17588 66.17  7.08     66   66.21  7.41  45  94
## Age                   2 17588 25.46  4.68     25   25.23  5.93  17  47
## Club_Position*        3 17588  3.55  1.41      3    3.48  1.48   1   6
## Attacking_Position    4 17588 49.59 19.41     54   51.11 17.79   2  94
## Speed                 5 17588 65.48 14.10     68   66.70 11.86  11  96
## Finishing             6 17588 45.16 19.37     48   45.78 22.24   2  95
## Ball_Control          7 17588 57.97 16.83     63   60.32 11.86   5  95
## Short_Pass            8 17588 58.12 14.98     62   59.88 10.38  10  92
## Vision                9 17588 52.71 14.59     54   53.27 14.83  10  94
## Standing_Tackle      10 17588 47.44 21.83     54   48.14 25.20   3  92
## Long_Pass            11 17588 52.40 15.62     56   53.46 14.83   7  93
## Dribbling            12 17588 54.80 18.91     60   57.08 13.34   4  97
## Jumping              13 17588 64.92 11.43     65   65.33 10.38  15  95
## GK_Positioning       14 17588 16.61 17.14     11   11.92  4.45   1  91
## GK_Diving            15 17588 16.82 17.80     11   11.94  4.45   1  89
## Work_Rate*           16 17588  6.94  2.79      9    7.34  0.00   1   9
## Weak_foot*           17 17588  2.93  0.66      3    2.91  0.00   1   5
##                    range  skew kurtosis   se
## Rating                49 -0.02    -0.03 0.05
## Age                   30  0.41    -0.43 0.04
## Club_Position*         5  0.45    -0.86 0.01
## Attacking_Position    92 -0.64    -0.52 0.15
## Speed                 85 -0.84     0.69 0.11
## Finishing             93 -0.25    -1.03 0.15
## Ball_Control          90 -1.18     0.76 0.13
## Short_Pass            82 -1.01     0.49 0.11
## Vision                84 -0.36    -0.36 0.11
## Standing_Tackle       89 -0.29    -1.34 0.16
## Long_Pass             86 -0.57    -0.46 0.12
## Dribbling             93 -1.00     0.14 0.14
## Jumping               80 -0.39     0.32 0.09
## GK_Positioning        90  2.43     4.49 0.13
## GK_Diving             88  2.41     4.30 0.13
## Work_Rate*             8 -0.91    -0.85 0.02
## Weak_foot*             4  0.13     0.60 0.00

Average rating of player by their work rate

##         Work_Rate        x
## 1     High / High 70.61580
## 2      High / Low 67.23425
## 3   High / Medium 68.22790
## 4      Low / High 66.96119
## 5       Low / Low 67.43333
## 6    Low / Medium 65.92650
## 7   Medium / High 68.20535
## 8    Medium / Low 67.34911
## 9 Medium / Medium 64.69849

Average rating of player by their weaker foot

##   Weak_foot_rating        x
## 1                1 62.65753
## 2                2 64.47823
## 3                3 65.86477
## 4                4 69.85309
## 5                5 70.94416

Two way table showing work rate per position

##                  
##                     GK  Res  Sub  FWD  MID  DEF
##   High / High        0   57  272   80  228  110
##   High / Low         0  107  349  142  119   13
##   High / Medium      0  405 1181  356  484  492
##   Low / High         0   74  167    0   45  152
##   Low / Low          0    8   12    5    5    0
##   Low / Medium       0   83  194    2   34  136
##   Medium / High      0  179  592   23  361  379
##   Medium / Low       0  134  375  147  178   11
##   Medium / Medium  632 2100 4350  376 1198 1241

Two way table showing weaker foot abilities per position

##      
##          1    2    3    4    5
##   GK    27  253  320   30    2
##   Res   35  784 2048  259   21
##   Sub   67 1649 4670 1036   70
##   FWD    1  125  653  325   27
##   MID    4  311 1651  632   54
##   DEF   12  644 1632  223   23

Three way table showing weaker foot abilities per position

##                               Weak_foot    1    2    3    4    5
## Club_Position Work_Rate                                         
## GK            High / High                  0    0    0    0    0
##               High / Low                   0    0    0    0    0
##               High / Medium                0    0    0    0    0
##               Low / High                   0    0    0    0    0
##               Low / Low                    0    0    0    0    0
##               Low / Medium                 0    0    0    0    0
##               Medium / High                0    0    0    0    0
##               Medium / Low                 0    0    0    0    0
##               Medium / Medium             27  253  320   30    2
## Res           High / High                  0   10   41    6    0
##               High / Low                   1   23   66   14    3
##               High / Medium                1   71  254   72    7
##               Low / High                   0   23   44    6    1
##               Low / Low                    0    3    3    2    0
##               Low / Medium                 1   35   47    0    0
##               Medium / High                3   46  117   10    3
##               Medium / Low                 0   31   85   16    2
##               Medium / Medium             29  542 1391  133    5
## Sub           High / High                  0   31  172   63    6
##               High / Low                   2   57  200   84    6
##               High / Medium                2  196  708  263   12
##               Low / High                   0   68   88   11    0
##               Low / Low                    0    3    7    2    0
##               Low / Medium                 1   75  109    9    0
##               Medium / High                0  128  401   61    2
##               Medium / Low                 1   67  219   81    7
##               Medium / Medium             61 1024 2766  462   37
## FWD           High / High                  1   11   35   29    4
##               High / Low                   0   17   72   42   11
##               High / Medium                0   32  202  118    4
##               Low / High                   0    0    0    0    0
##               Low / Low                    0    0    4    1    0
##               Low / Medium                 0    1    0    1    0
##               Medium / High                0    3   14    6    0
##               Medium / Low                 0   19   90   36    2
##               Medium / Medium              0   42  236   92    6
## MID           High / High                  0   18  120   81    9
##               High / Low                   0   16   62   36    5
##               High / Medium                0   52  262  162    8
##               Low / High                   0    9   32    4    0
##               Low / Low                    0    0    2    3    0
##               Low / Medium                 1    7   22    3    1
##               Medium / High                0   37  261   60    3
##               Medium / Low                 0   24   83   66    5
##               Medium / Medium              3  148  807  217   23
## DEF           High / High                  0   22   69   17    2
##               High / Low                   0    7    5    1    0
##               High / Medium                2  108  307   65   10
##               Low / High                   1   47   96    7    1
##               Low / Low                    0    0    0    0    0
##               Low / Medium                 1   51   77    6    1
##               Medium / High                0   81  252   42    4
##               Medium / Low                 1    4    6    0    0
##               Medium / Medium              7  324  820   85    5

Boxplot of rating and work rate

Boxplot of rating and weak foot

Appendix 2

H2: Work rate and player position are dependant on each other

chisq.test(table(fifa$Work_Rate,fifa$Club_Position))
## Warning in chisq.test(table(fifa$Work_Rate, fifa$Club_Position)): Chi-
## squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  table(fifa$Work_Rate, fifa$Club_Position)
## X-squared = 2272.8, df = 40, p-value < 2.2e-16

From this test we can say that p-value is extremely small < 0.01 hence we reject the null hypothesis that work rate and player position are independant

H3: weak foot and player position are dependant on each other

chisq.test(table(fifa$Club_Position,fifa$Weak_foot))
## 
##  Pearson's Chi-squared test
## 
## data:  table(fifa$Club_Position, fifa$Weak_foot)
## X-squared = 1016.6, df = 20, p-value < 2.2e-16

From this test we can say that p-value is extremely small < 0.01 hence we reject the null hypothesis that weaker foot performance and player position are independant

ANOVA Test

fit2 <- aov(Rating ~ Club_Position*Work_Rate, data=fifa)
summary(fit2)
##                            Df Sum Sq Mean Sq F value  Pr(>F)    
## Club_Position               5 158335   31667 810.123 < 2e-16 ***
## Work_Rate                   8  35890    4486 114.771 < 2e-16 ***
## Club_Position:Work_Rate    30   2318      77   1.977 0.00114 ** 
## Residuals               17544 685780      39                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
## 
##     lowess
## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped

Regression model test

fit1 <- lm(Rating ~ Age+Short_Pass+Vision+Long_Pass+Ball_Control+Finishing+Attacking_Position,data=fifa)
summary(fit1)
## 
## Call:
## lm(formula = Rating ~ Age + Short_Pass + Vision + Long_Pass + 
##     Ball_Control + Finishing + Attacking_Position, data = fifa)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.0366  -3.5878  -0.2602   3.2546  28.0084 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        37.088991   0.265615 139.634  < 2e-16 ***
## Age                 0.579878   0.008840  65.595  < 2e-16 ***
## Short_Pass          0.112184   0.008868  12.650  < 2e-16 ***
## Vision              0.126840   0.004751  26.696  < 2e-16 ***
## Long_Pass          -0.024811   0.006444  -3.850 0.000118 ***
## Ball_Control        0.123684   0.007370  16.781  < 2e-16 ***
## Finishing          -0.005758   0.004603  -1.251 0.210980    
## Attacking_Position -0.090792   0.005408 -16.789  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.313 on 17580 degrees of freedom
## Multiple R-squared:  0.4376, Adjusted R-squared:  0.4373 
## F-statistic:  1954 on 7 and 17580 DF,  p-value: < 2.2e-16

This is a low quality model, accounting for only 43.76% of variances