In this task, we will focus on Kevin De Bruyne remplacement at
Manchester City. His position is “MF, FW”.
We have the dabase of all the players of Big 5 League for the 24/25
season.
library(tidyverse)
library(fmsb)
library(lsa)
library(knitr)
library(DT)
In our case, we will work with players whose primary position is
MF, as this is the position of our target player.
## We consider all players
## We keep the players whose main position is: MF
##
## Number of players: 1245
What competitions are we working with?
## [1] "Bundesliga" "Eredivisie" "La Liga" "Ligue 1"
## [5] "Premier League" "Primeira Liga" "Serie A"
What positions do our players have?
## [1] "MF,FW" "MF" "MF,DF"
As we are looking for an offensive midfielder, we will eliminate
players with MF,DF position as they will have a more
defending profile than our player is.
We define the filter_player function that will allow us,
for example, to select those players who have played a minimum number of
minutes. We could even, if we are analysing the top scorers of the
moment, add a function so that we keep the players who average more than
1 goal per game. In this way what we do is to keep a smaller group of
players on which to carry out the study.
This function will have the following parameters:
data: Data set read from a CSV.metrics: List of metrics to be considered in the
analysis.pct_min_minutes: Minimum percentage of minutes of the
players to be within the study (the sample).age_max: Maximum age of the player to be considered in
the sample.In our example, we keep midfielders who have played at least 50% of the total minutes of his team and who are under 30 years of age.
## Duplicated players:
Rename some metrics:
## Player Squad Age Goals by 90' Passes Completion %
## 1 Adrian Beck Heidenheim 27 0.23 80.1
## 2 Alexis Claude-Maurice Augsburg 26 0.38 82.9
## 3 András Schäfer Union Berlin 25 0.06 73.5
## 4 Angelo Stiller Stuttgart 23 0.03 87.4
## 5 Armin Gigovic Holstein Kiel 22 0.24 77.7
## 6 Arthur Vermeeren RB Leipzig 19 0.00 86.1
## Passes Completed by 90' Progressive Passes by 90' Final Third Passes by 90'
## 1 18.16 2.16 1.66
## 2 19.39 2.18 1.36
## 3 15.92 2.12 1.15
## 4 69.34 9.03 8.47
## 5 16.90 1.61 1.42
## 6 29.32 2.75 2.54
## Long Passes completed by 90' Assists by 90' Shot-Creating Actions by 90'
## 1 1.72 0.06 3.16
## 2 1.11 0.09 3.29
## 3 0.50 0.06 1.70
## 4 5.31 0.26 3.64
## 5 0.87 0.05 1.85
## 6 1.07 0.11 1.60
## Touches by 90'
## 1 30.06
## 2 35.50
## 3 30.73
## 4 86.91
## 5 30.16
## 6 39.61
Once we have selected our study sample, we calculate a value that summarises the performance of these players. This will be our rating.
To do this, as we are working with different metrics measured in different magnitudes, the first step is to normalise the variables, so all variables will be in the same range. Subsequently, we can assign weights and calculate the final score.
We use the MinMax transformer to normalise the values of the
performance variables. For that, we define a normalize
function that contains the definition of this transformer:
We apply this function to each of the columns (from column 4 onwards)
of the dataframe df_midfielders_clean.
## Player Squad Age Goals.by.90.
## Length:325 Length:325 Min. : 0.00 Min. :0.00000
## Class :character Class :character 1st Qu.:22.00 1st Qu.:0.05128
## Mode :character Mode :character Median :24.00 Median :0.12821
## Mean :24.51 Mean :0.17349
## 3rd Qu.:27.00 3rd Qu.:0.25641
## Max. :30.00 Max. :1.00000
## Passes.Completion.. Passes.Completed.by.90. Progressive.Passes.by.90.
## Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.8271 1st Qu.:0.2209 1st Qu.:0.2498
## Median :0.8754 Median :0.2921 Median :0.3273
## Mean :0.8616 Mean :0.3010 Mean :0.3475
## 3rd Qu.:0.9108 3rd Qu.:0.3557 3rd Qu.:0.4166
## Max. :1.0000 Max. :1.0000 Max. :1.0000
## Final.Third.Passes.by.90. Long.Passes.completed.by.90. Assists.by.90.
## Min. :0.0000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.1533 1st Qu.:0.1324 1st Qu.:0.07143
## Median :0.2130 Median :0.2159 Median :0.16071
## Mean :0.2303 Mean :0.2356 Mean :0.19813
## 3rd Qu.:0.2840 3rd Qu.:0.2975 3rd Qu.:0.28571
## Max. :1.0000 Max. :1.0000 Max. :1.00000
## Shot.Creating.Actions.by.90. Touches.by.90.
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.2846 1st Qu.:0.3111
## Median :0.3738 Median :0.3783
## Mean :0.4132 Mean :0.3849
## 3rd Qu.:0.5169 3rd Qu.:0.4363
## Max. :1.0000 Max. :1.0000
We see that the numerical variables (the performance metrics) all have a minimum value of 0 and a maximum value of 1; they are normalised.
Once the variables have been normalised, we calculate the final scoring for each player.
We define the function calc_scoring that receives as
parameters:
data: Data set transformed using MinMax.weights: List of weights associated to each
variable.ind_metric: Index of the column where the performance
metrics start.columns_return: indicate which columns we want to
return in the final result.n: number of players to return (according to calculated
scoring)We test our example. To do so, we assign weights to each of the performance metrics, with the only requirement that the sum must be 1.
## [1] "Player" "Squad"
## [3] "Age" "Goals.by.90."
## [5] "Passes.Completion.." "Passes.Completed.by.90."
## [7] "Progressive.Passes.by.90." "Final.Third.Passes.by.90."
## [9] "Long.Passes.completed.by.90." "Assists.by.90."
## [11] "Shot.Creating.Actions.by.90." "Touches.by.90."
## Weights sum: 1
## Player Squad Age Final Score
## 1 Joshua Kimmich Bayern Munich 29 8.175
## 2 Joey Veerman PSV Eindhoven 25 7.374
## 3 Pedri Barcelona 21 6.562
## 4 Bruno Fernandes Manchester Utd 29 6.360
## 5 Orkun Kökçü Benfica 23 6.276
## 6 Angelo Stiller Stuttgart 23 6.145
## 7 Pierre Højbjerg Marseille 28 5.847
## 8 Florian Wirtz Leverkusen 21 5.724
## 9 Vitinha PSG 24 5.707
## 10 Martin Ødegaard Arsenal 25 5.453
Based on the weighted metrics, Joshua Kimmich is the top candidate, combining passing, tempo control, and experience — and represents a realistic signing option for Manchester City.
Joey Veerman emerges as a clear market opportunity: he fits the profile, performs strongly, and could be signed at a reasonable cost.
While Pedri, and Bruno Fernandes also score highly, their clubs make them virtually unattainable.
Orkun Kokcu and Angelo Stiller could be good options too.
Pierre Hojberg could bring experience.
Florian Wirtz it’s the young bet, on a player who already have the interest of many big european clubs, the cost could be very high : it’s not necessarly a problem for Manchester City.
Vitinha still unattainable like Pedri and Bruno Fernandes.
Martin Odegaard could be a difficult player to move away from Arsenal.
From our analysis, Joshua Kimmich stands out as the most complete midfielder in terms of the weighted metrics. But how closely does he resemble Kevin De Bruyne in style and role? And what about other ones ?
For this analysis, we need access to Kevin De Bruyne’ performance data
## We consider all players
## We keep the players whose main position is: MF
We develop the function similarity_tool that calculates
the N players most similar to the player indicated in the
player argument. The rest of the arguments will be:
sample: Sample of players.player: Player to find similarities.distance: Type of distance (Euclidean or cosine).n: Number of similar players.Using the dataset already filtered in the previous exercise, we use the above function to find out which players performed most similarly to the Manchester City’s midfielder.
## Player Similarity Goals/90 Passes% PassesCompleted/90
## 1 Alex Baena 75.91482 0.24 66.6 26.78
## 2 Francisco Trincão 72.51973 0.28 78.5 24.91
## 3 Julian Brandt 70.18377 0.20 78.8 31.77
## 4 Mohammed Ihattaren 67.53767 0.23 71.6 25.21
## 5 Nicolás Paz 66.67454 0.20 80.6 28.34
## 6 Ludovic Blas 66.56148 0.24 73.9 23.90
## 7 Fabio Vieira 65.35014 0.20 79.2 29.69
## 8 James Maddison 63.35323 0.45 81.3 32.94
## 9 Sven Mijnans 62.78953 0.24 76.3 31.46
## 10 Gaëtan Perrin 62.34229 0.33 70.6 21.41
## PassesProgressive/90 PassesUT/90 LongPassesCompleted/90 Ast/90 SCA/90
## 1 5.91 2.97 4.31 0.31 5.86
## 2 4.53 2.82 2.06 0.43 4.94
## 3 4.97 3.13 2.23 0.39 3.99
## 4 4.17 2.75 2.83 0.23 5.08
## 5 4.66 3.31 1.26 0.27 4.82
## 6 4.00 2.62 2.79 0.31 4.20
## 7 4.77 3.19 3.85 0.20 4.74
## 8 5.29 3.97 2.42 0.35 4.73
## 9 4.61 2.57 3.39 0.20 4.33
## 10 4.18 2.56 3.32 0.37 4.18
## Touches/90
## 1 49.03
## 2 44.82
## 3 48.50
## 4 43.67
## 5 48.09
## 6 43.52
## 7 45.00
## 8 47.23
## 9 51.93
## 10 39.03
Based on the similarity algorithm, Alex Baena emerges as the player most similar to Kevin De Bruyne in terms of style and key performance metrics.
If we analyse the cosine distance:
## Player Similarity Goals/90 Passes% PassesCompleted/90
## 1 Alex Baena 96.514 0.24 66.6 26.78
## 2 Fabio Vieira 95.629 0.20 79.2 29.69
## 3 Francisco Trincão 95.230 0.28 78.5 24.91
## 4 Alan 95.108 0.25 80.4 32.10
## 5 Sven Mijnans 95.003 0.24 76.3 31.46
## 6 Mohammed Ihattaren 94.674 0.23 71.6 25.21
## 7 Florian Wirtz 94.600 0.38 78.3 44.19
## 8 Julian Brandt 94.589 0.20 78.8 31.77
## 9 Morgan Gibbs-White 94.566 0.22 78.3 29.21
## 10 Nicolás Paz 94.304 0.20 80.6 28.34
## PassesProgressive/90 PassesUT/90 LongPassesCompleted/90 Ast/90 SCA/90
## 1 5.91 2.97 4.31 0.31 5.86
## 2 4.77 3.19 3.85 0.20 4.74
## 3 4.53 2.82 2.06 0.43 4.94
## 4 5.29 3.52 2.39 0.21 4.19
## 5 4.61 2.57 3.39 0.20 4.33
## 6 4.17 2.75 2.83 0.23 5.08
## 7 5.68 3.10 2.23 0.46 5.66
## 8 4.97 3.13 2.23 0.39 3.99
## 9 4.82 3.76 2.15 0.26 3.69
## 10 4.66 3.31 1.26 0.27 4.82
## Touches/90
## 1 49.03
## 2 45.00
## 3 44.82
## 4 47.00
## 5 51.93
## 6 43.67
## 7 67.61
## 8 48.50
## 9 47.82
## 10 48.09
As a final step, we decided to highlight the most interesting players and project them on a graph as potential replacements for Kevin De Bruyne :
Alex Baena – the player most similar to De Bruyne in style and key metrics, confirming him as a strong tactical fit.
Joshua Kimmich – the top performer overall, combining elite passing, tempo control, and experience, making him the optimal replacement from a performance standpoint.
Joey Veerman – a market opportunity, offering a promising profile at a reasonable cost, and a realistic option for acquisition.
We calculate the p5 and p95 for each of the analysis metrics.
We create the dataframe min_max_df that will contain the
p5 and p95 for each of the study metrics.
## Gls.90 Passes. PassesCompleted.90 PassesProgressive.90 FinalThirdPasses.90
## p95 0.3875 89.600 48.7950 6.725 5.565
## p5 0.0000 68.725 13.9075 1.680 1.060
## LongPassesCompleted.90 Ast.90 SCA.90 Touches.90
## p95 5.015 0.29 4.815 65.645
## p5 0.665 0.00 1.200 27.460
## Gls.90 Passes. PassesCompleted.90 PassesProgressive.90
## p95 0.3875 89.600 48.7950 6.725
## p5 0.0000 68.725 13.9075 1.680
## Joshua Kimmich 0.0900 89.500 48.7950 6.725
## Joey Veerman 0.0500 80.900 48.7950 6.725
## Alex Baena 0.2400 68.725 26.7800 5.910
## Kevin De Bruyne 0.2100 75.900 32.4300 5.610
## FinalThirdPasses.90 LongPassesCompleted.90 Ast.90 SCA.90
## p95 5.565 5.015 0.29 4.815
## p5 1.060 0.665 0.00 1.200
## Joshua Kimmich 5.565 5.015 0.22 4.815
## Joey Veerman 5.565 5.015 0.29 4.815
## Alex Baena 2.970 4.310 0.29 4.815
## Kevin De Bruyne 3.140 3.210 0.29 4.815
## Touches.90
## p95 65.645
## p5 27.460
## Joshua Kimmich 65.645
## Joey Veerman 65.645
## Alex Baena 49.030
## Kevin De Bruyne 48.860
We use the fmsb library to create the radar chart. We
define the function create_radarchart where we modify the
radarchart function of the fmsb library that
will allow us to work with the different arguments (colours, names of
the axes, etc.):
Based on our scouting analysis, Manchester City has three viable approaches to replace Kevin De Bruyne:
If the primary objective is to maintain elite-level midfield performance, Kimmich should be pursued. He guarantees immediate quality, control, and leadership in the midfield, though at a higher transfer cost and with potential negotiation complexity.
For a replacement closely aligned with De Bruyne’ tactical role and style, Baena represents a high-probability fit. He ensures continuity in the team’s possession game and could integrate seamlessly into the existing structure.
If the club prioritizes cost-effective acquisitions with growth potential, Veerman is the most attractive option. While slightly less experienced or complete than Kimmich or Baena, he balances quality and affordability, reducing financial risk.