This post is part of a short series examining clustering of player and team types in soccer using data from the top 5 European leagues

Like most sports, soccer players are often grouped into classic generalized positions (goalkeeper, defender, midfielder, and forwards) with some flexibility. If we define player roles under the stiff positional classification, we would likely use the following (1):

As the game has evolved, the previously rigid structures of formations have become more fluid, with a greater empahsis on strategically placing players in positions or specific roles on the field catered to their strengths. For example, players such as Trent Alexander-Arnold, who is listed as a right defensive back, often come up the pitch to be involved in attacking plays out wide or in the midfield. There are also roles such as the false 9, or a striker who comes deeper to receive the ball in order to drag defenders out of position and create more opportunities for their teammates to attack. Given these changes, position hardly only describes the general area a player occupies during the match, and even this is subject to change depending on strategy.

Instead of focusing just on what position a player fills, modern soccer analysis should focus on how certain players can fit into a team’s desired style of play through their “player type,” whichis the focus of this analysis.

I highly recommend reading this article by the Athletic as well, which is a very robust version of this analysis: https://theathletic.com/3473297/2022/08/10/player-roles-the-athletic/



Methodology


This analysis uses K-means clustering of FBref data for the top 5 European Leagues (English Premier League, Spanish La Liga, French Ligue 1, German Bundesliga, and Italian Serie A) from 2018 to 2023 to determine what type a player falls into. K-Means clustering is a sampling method for grouping similar observations throughout a daatset. In this case, players are being grouped by their similarities. These were divided into separate exercises for outfield players and goalkeepers since their roles are drastically different within the context of the game.

To learn more about K-Means clustering, I recommend reading the following article: https://365datascience.com/tutorials/python-tutorials/k-means-clustering/



Key Variables

The following variables were used to cluster outfield players (2):


Non-penalty xG per shot Percentage of passes - progressive Percentage of passes - short
Percentage of passes - long Crosses per pass attempt Percentage of passes - switches
Percentage of passes - entering final third Successful dribbles per 90 minutes Attempted dribbles per 90 minutes
Ball carry distance per 90 minutes Percentage of carry distance - progressive Miscontrols per pass received
Dispossed per pass received Percentage of tackles - attacking third Percentage of tackles - middle third
Percentage of tackles - defensive third Yellow cards per 90 minutes Fouls per 90 minutes
Percentage of successful tackles Expected assists per pass Percentage of passes - into penalty area
Crosses into penalty area Number of passes into the penalty area Percentage of passes - medium



Results

Outfield Players


For this analysis, I divided outfield players into 10 unique clusters. The K-means analysis resulted in the following player types:

  • Finishers/Target men: Players that enter the attacking third and penalty area and have the highest xG per shot on target. These players are not necessarily speedy forwards with a high proportion of dribbles or take-ons, but instead high shot-takers who occupy dangerous positions.

  • Pressing forwards: Aggressive forwards that make direct runs into attacking areas, and seek to disrupt passes between defenders. These are players with high work-rates, and generate value through positioning and defending rather than goals or assists.

  • Direct box threats: Players who directly pass and carry the ball into the penalty box, rack up assists, and are key for breaking down defenses. These players will dribble into dangerous areas to create high-value chances for themselves or others.

  • General defenders: Defenders that play progressive passes but don’t carry the ball or press forward. These players primarily focus on stopping attacks and maintaining defensive shape.

  • Ball-playing defenders: Defenders that progress the ball up the field with a variety passes or dribbles. These players are involved in build-up from the back, serve as passing outlets for midfielders, or play line-breaking passes.

  • Attacking wingbacks: Defenders that come wide and play a high number of crosses into the penalty area. These players are key for creating attacking width on the pitch that stretches the defense, and need high work-rates to cover a lot of ground between attack and defense.

  • Defensive midfielders: Pivot midfielders that make passes through the lines into attacking areas and frequently switch the direction of attacks. They also take on defensive duties by preventing counter attacks or covering for defenders.

  • Stoppers: Defenders that make a high percentage of tackles in the defensive third, clear the ball away from the box, and win balls in the air. They are highly physical players.

  • Chance creators: Players with the majority of attacking passes and progression of the ball into areas for shooting. These players are the focal points of attacks and distribute the ball to players in the box.

  • Holding distributors: Players that are important for pressing by defending in the midfield and distribute the ball into the attacking third. These are generally either pivot midfielders or defenders who step into the midfield during attacking or build up phases.

While this analysis is aimed towards defining player roles outside of positions, it is important to note the positional make up of each player class to examine what positions contain the highest frequency of particular player roles. This breakdown can be seen in Figure 1.

Figure 1

Goalkeepers

Goalkeepers were divided into three classes:

  • Shot stopper: Excels at defending the goal and saving goals. These players are not as involved in build up play or handling of the ball, but are the most reliable at preventing goals.

  • Ball distributor: Distributes the ball across the pitch with long progessive passes.

  • Sweeper keeper: Makes tackles in the defensive third to prevent the opposing team from progressing the ball up the field. They often pass the ball to midfielders and holding the defenders, and are comfortable on the ball.

Figure 2

Potential Applications


Defining players types is crucial for how clubs and spectators understand the game as it occurs on the pitch by describing how a player impacts the game both and in out of possession. For clubs, recruiting quality players that fill certain roles is key to success. Given the funds required for key transfers, clubs need to ensure that the player they recruit fills a needed role in their team, and can adjust their preferences based on a number of other performance-related data points.

See the appendix for a full breakdown of players and their determined player type




Appendix:


Figure 3

Figure 4

##### Goalkeeper classes



##### Outfield classes



Figure 5




References


  1. https://www.merriam-webster.com/
  2. https://fbref.com/en/
  3. https://jaseziv.github.io/worldfootballR/index.html