I saw an R bloggers post a while ago that looked at players stats in FIFA to determine what position they should be playing given the player stats and positions of everyone else. I thought I would do that with last years FPL data from TOGGA.

I amassed all the data from 763 players and removed players that did not play at least 10 games. I then calculated the average player statistics per game. To refresh your memory, the player statistics were;

 [1] "G_Goals"                 "A_Assists"               "CC_Key.Passes"           "SCR_Successful.Crosses" 
 [5] "SOT_Shots.on.Target"     "STO_Successful.Dribbles" "AER_Aerials.Won"         "CLR_Effective.Clearance"
 [9] "CS_Clean.Sheets"         "INT_Interceptions"       "PS_Penalty.Saves"        "SV_Saves"               
[13] "TW_Tackles.Won"          "DIS_Dispossed"           "GC_Goals.Conceded"       "OG_Own.Goals"           
[17] "YC_Yellow.Cards"         "RC_Red.Cards"           

Obviously points within these categories are more likely to be obtained by certain field positions than others e.g. “CS_Clean.Sheets” is a defensive statistic that forwards get no points for. We can use them together to determine what categories are reflective of what positions, but more interestingly, whether certain players cross the position boundaries e.g. an attacking defender. You could probably observe such patterns just by watching games, highlights, etc, but lets look at the data.

Given I had the years total stats e.g. goals for the season, I calculated the average per game. I then scaled each category to have unit variance, and conducted a principal component analysis. The analysis showed clear clustering of player by position given their statistics - not unexpected.

I was interested to see what drove the extremities of the data cloud. I found this to be simply the highest average score.