Introduction

In this task, we will focus on Kevin De Bruyne remplacement at Manchester City. His position is “MF, FW”.
We have the dabase of all the players of Big 5 League for the 24/25 season.

Packages used

library(tidyverse)
library(fmsb)
library(lsa)
library(knitr)
library(DT)

Data Exploration

In our case, we will work with players whose primary position is MF, as this is the position of our target player.

## We consider all players 
## We keep the players whose main position is: MF
## 
## Number of players: 1245

What competitions are we working with?

## [1] "Bundesliga"     "Eredivisie"     "La Liga"        "Ligue 1"       
## [5] "Premier League" "Primeira Liga"  "Serie A"

What positions do our players have?

## [1] "MF,FW" "MF"    "MF,DF"

As we are looking for an offensive midfielder, we will eliminate players with MF,DF position as they will have a more defending profile than our player is.

Data filtering

We define the filter_player function that will allow us, for example, to select those players who have played a minimum number of minutes. We could even, if we are analysing the top scorers of the moment, add a function so that we keep the players who average more than 1 goal per game. In this way what we do is to keep a smaller group of players on which to carry out the study.

This function will have the following parameters:

In our example, we keep midfielders who have played at least 50% of the total minutes of his team and who are under 30 years of age.

Are there duplicated records?

## Duplicated players:

Rename some metrics:

##                  Player         Squad Age Goals by 90' Passes Completion %
## 1           Adrian Beck    Heidenheim  27         0.23                80.1
## 2 Alexis Claude-Maurice      Augsburg  26         0.38                82.9
## 3        András Schäfer  Union Berlin  25         0.06                73.5
## 4        Angelo Stiller     Stuttgart  23         0.03                87.4
## 5         Armin Gigovic Holstein Kiel  22         0.24                77.7
## 6      Arthur Vermeeren    RB Leipzig  19         0.00                86.1
##   Passes Completed by 90' Progressive Passes by 90' Final Third Passes by 90'
## 1                   18.16                      2.16                      1.66
## 2                   19.39                      2.18                      1.36
## 3                   15.92                      2.12                      1.15
## 4                   69.34                      9.03                      8.47
## 5                   16.90                      1.61                      1.42
## 6                   29.32                      2.75                      2.54
##   Long Passes completed by 90' Assists by 90' Shot-Creating Actions by 90'
## 1                         1.72           0.06                         3.16
## 2                         1.11           0.09                         3.29
## 3                         0.50           0.06                         1.70
## 4                         5.31           0.26                         3.64
## 5                         0.87           0.05                         1.85
## 6                         1.07           0.11                         1.60
##   Touches by 90'
## 1          30.06
## 2          35.50
## 3          30.73
## 4          86.91
## 5          30.16
## 6          39.61

Scoring calculation

Once we have selected our study sample, we calculate a value that summarises the performance of these players. This will be our rating.

To do this, as we are working with different metrics measured in different magnitudes, the first step is to normalise the variables, so all variables will be in the same range. Subsequently, we can assign weights and calculate the final score.

Data transformation

We use the MinMax transformer to normalise the values of the performance variables. For that, we define a normalize function that contains the definition of this transformer:

We apply this function to each of the columns (from column 4 onwards) of the dataframe df_midfielders_clean.

##     Player             Squad                Age         Goals.by.90.    
##  Length:325         Length:325         Min.   : 0.00   Min.   :0.00000  
##  Class :character   Class :character   1st Qu.:22.00   1st Qu.:0.05128  
##  Mode  :character   Mode  :character   Median :24.00   Median :0.12821  
##                                        Mean   :24.51   Mean   :0.17349  
##                                        3rd Qu.:27.00   3rd Qu.:0.25641  
##                                        Max.   :30.00   Max.   :1.00000  
##  Passes.Completion.. Passes.Completed.by.90. Progressive.Passes.by.90.
##  Min.   :0.0000      Min.   :0.0000          Min.   :0.0000           
##  1st Qu.:0.8271      1st Qu.:0.2209          1st Qu.:0.2498           
##  Median :0.8754      Median :0.2921          Median :0.3273           
##  Mean   :0.8616      Mean   :0.3010          Mean   :0.3475           
##  3rd Qu.:0.9108      3rd Qu.:0.3557          3rd Qu.:0.4166           
##  Max.   :1.0000      Max.   :1.0000          Max.   :1.0000           
##  Final.Third.Passes.by.90. Long.Passes.completed.by.90. Assists.by.90.   
##  Min.   :0.0000            Min.   :0.0000               Min.   :0.00000  
##  1st Qu.:0.1533            1st Qu.:0.1324               1st Qu.:0.07143  
##  Median :0.2130            Median :0.2159               Median :0.16071  
##  Mean   :0.2303            Mean   :0.2356               Mean   :0.19813  
##  3rd Qu.:0.2840            3rd Qu.:0.2975               3rd Qu.:0.28571  
##  Max.   :1.0000            Max.   :1.0000               Max.   :1.00000  
##  Shot.Creating.Actions.by.90. Touches.by.90.  
##  Min.   :0.0000               Min.   :0.0000  
##  1st Qu.:0.2846               1st Qu.:0.3111  
##  Median :0.3738               Median :0.3783  
##  Mean   :0.4132               Mean   :0.3849  
##  3rd Qu.:0.5169               3rd Qu.:0.4363  
##  Max.   :1.0000               Max.   :1.0000

We see that the numerical variables (the performance metrics) all have a minimum value of 0 and a maximum value of 1; they are normalised.

Scoring calculation

Once the variables have been normalised, we calculate the final scoring for each player.

We define the function calc_scoring that receives as parameters:

  • data: Data set transformed using MinMax.
  • weights: List of weights associated to each variable.
  • ind_metric: Index of the column where the performance metrics start.
  • columns_return: indicate which columns we want to return in the final result.
  • n: number of players to return (according to calculated scoring)

We test our example. To do so, we assign weights to each of the performance metrics, with the only requirement that the sum must be 1.

##  [1] "Player"                       "Squad"                       
##  [3] "Age"                          "Goals.by.90."                
##  [5] "Passes.Completion.."          "Passes.Completed.by.90."     
##  [7] "Progressive.Passes.by.90."    "Final.Third.Passes.by.90."   
##  [9] "Long.Passes.completed.by.90." "Assists.by.90."              
## [11] "Shot.Creating.Actions.by.90." "Touches.by.90."
## Weights sum: 1
##             Player          Squad Age Final Score
## 1   Joshua Kimmich  Bayern Munich  29       8.175
## 2     Joey Veerman  PSV Eindhoven  25       7.374
## 3            Pedri      Barcelona  21       6.562
## 4  Bruno Fernandes Manchester Utd  29       6.360
## 5      Orkun Kökçü        Benfica  23       6.276
## 6   Angelo Stiller      Stuttgart  23       6.145
## 7  Pierre Højbjerg      Marseille  28       5.847
## 8    Florian Wirtz     Leverkusen  21       5.724
## 9          Vitinha            PSG  24       5.707
## 10 Martin Ødegaard        Arsenal  25       5.453
  • Based on the weighted metrics, Joshua Kimmich is the top candidate, combining passing, tempo control, and experience — and represents a realistic signing option for Manchester City.

  • Joey Veerman emerges as a clear market opportunity: he fits the profile, performs strongly, and could be signed at a reasonable cost.

  • While Pedri, and Bruno Fernandes also score highly, their clubs make them virtually unattainable.

  • Orkun Kokcu and Angelo Stiller could be good options too.

  • Pierre Hojberg could bring experience.

  • Florian Wirtz it’s the young bet, on a player who already have the interest of many big european clubs, the cost could be very high : it’s not necessarly a problem for Manchester City.

  • Vitinha still unattainable like Pedri and Bruno Fernandes.

  • Martin Odegaard could be a difficult player to move away from Arsenal.

Similarity algorithm

From our analysis, Joshua Kimmich stands out as the most complete midfielder in terms of the weighted metrics. But how closely does he resemble Kevin De Bruyne in style and role? And what about other ones ?

For this analysis, we need access to Kevin De Bruyne’ performance data

## We consider all players 
## We keep the players whose main position is: MF

We develop the function similarity_tool that calculates the N players most similar to the player indicated in the player argument. The rest of the arguments will be:

Looking for players most similar to Kevin De Bruyne

Using the dataset already filtered in the previous exercise, we use the above function to find out which players performed most similarly to the Manchester City’s midfielder.

##                Player Similarity Goals/90 Passes% PassesCompleted/90
## 1          Alex Baena   75.91482     0.24    66.6              26.78
## 2   Francisco Trincão   72.51973     0.28    78.5              24.91
## 3       Julian Brandt   70.18377     0.20    78.8              31.77
## 4  Mohammed Ihattaren   67.53767     0.23    71.6              25.21
## 5         Nicolás Paz   66.67454     0.20    80.6              28.34
## 6        Ludovic Blas   66.56148     0.24    73.9              23.90
## 7        Fabio Vieira   65.35014     0.20    79.2              29.69
## 8      James Maddison   63.35323     0.45    81.3              32.94
## 9        Sven Mijnans   62.78953     0.24    76.3              31.46
## 10      Gaëtan Perrin   62.34229     0.33    70.6              21.41
##    PassesProgressive/90 PassesUT/90 LongPassesCompleted/90 Ast/90 SCA/90
## 1                  5.91        2.97                   4.31   0.31   5.86
## 2                  4.53        2.82                   2.06   0.43   4.94
## 3                  4.97        3.13                   2.23   0.39   3.99
## 4                  4.17        2.75                   2.83   0.23   5.08
## 5                  4.66        3.31                   1.26   0.27   4.82
## 6                  4.00        2.62                   2.79   0.31   4.20
## 7                  4.77        3.19                   3.85   0.20   4.74
## 8                  5.29        3.97                   2.42   0.35   4.73
## 9                  4.61        2.57                   3.39   0.20   4.33
## 10                 4.18        2.56                   3.32   0.37   4.18
##    Touches/90
## 1       49.03
## 2       44.82
## 3       48.50
## 4       43.67
## 5       48.09
## 6       43.52
## 7       45.00
## 8       47.23
## 9       51.93
## 10      39.03

Based on the similarity algorithm, Alex Baena emerges as the player most similar to Kevin De Bruyne in terms of style and key performance metrics.

If we analyse the cosine distance:

##                Player Similarity Goals/90 Passes% PassesCompleted/90
## 1          Alex Baena     96.514     0.24    66.6              26.78
## 2        Fabio Vieira     95.629     0.20    79.2              29.69
## 3   Francisco Trincão     95.230     0.28    78.5              24.91
## 4                Alan     95.108     0.25    80.4              32.10
## 5        Sven Mijnans     95.003     0.24    76.3              31.46
## 6  Mohammed Ihattaren     94.674     0.23    71.6              25.21
## 7       Florian Wirtz     94.600     0.38    78.3              44.19
## 8       Julian Brandt     94.589     0.20    78.8              31.77
## 9  Morgan Gibbs-White     94.566     0.22    78.3              29.21
## 10        Nicolás Paz     94.304     0.20    80.6              28.34
##    PassesProgressive/90 PassesUT/90 LongPassesCompleted/90 Ast/90 SCA/90
## 1                  5.91        2.97                   4.31   0.31   5.86
## 2                  4.77        3.19                   3.85   0.20   4.74
## 3                  4.53        2.82                   2.06   0.43   4.94
## 4                  5.29        3.52                   2.39   0.21   4.19
## 5                  4.61        2.57                   3.39   0.20   4.33
## 6                  4.17        2.75                   2.83   0.23   5.08
## 7                  5.68        3.10                   2.23   0.46   5.66
## 8                  4.97        3.13                   2.23   0.39   3.99
## 9                  4.82        3.76                   2.15   0.26   3.69
## 10                 4.66        3.31                   1.26   0.27   4.82
##    Touches/90
## 1       49.03
## 2       45.00
## 3       44.82
## 4       47.00
## 5       51.93
## 6       43.67
## 7       67.61
## 8       48.50
## 9       47.82
## 10      48.09

Radar chart

As a final step, we decided to highlight the most interesting players and project them on a graph as potential replacements for Kevin De Bruyne :

Boundary construction

We calculate the p5 and p95 for each of the analysis metrics.

We create the dataframe min_max_df that will contain the p5 and p95 for each of the study metrics.

##     Gls.90 Passes. PassesCompleted.90 PassesProgressive.90 FinalThirdPasses.90
## p95 0.3875  89.600            48.7950                6.725               5.565
## p5  0.0000  68.725            13.9075                1.680               1.060
##     LongPassesCompleted.90 Ast.90 SCA.90 Touches.90
## p95                  5.015   0.29  4.815     65.645
## p5                   0.665   0.00  1.200     27.460

Preparation of the dataframe

##                 Gls.90 Passes. PassesCompleted.90 PassesProgressive.90
## p95             0.3875  89.600            48.7950                6.725
## p5              0.0000  68.725            13.9075                1.680
## Joshua Kimmich  0.0900  89.500            48.7950                6.725
## Joey Veerman    0.0500  80.900            48.7950                6.725
## Alex Baena      0.2400  68.725            26.7800                5.910
## Kevin De Bruyne 0.2100  75.900            32.4300                5.610
##                 FinalThirdPasses.90 LongPassesCompleted.90 Ast.90 SCA.90
## p95                           5.565                  5.015   0.29  4.815
## p5                            1.060                  0.665   0.00  1.200
## Joshua Kimmich                5.565                  5.015   0.22  4.815
## Joey Veerman                  5.565                  5.015   0.29  4.815
## Alex Baena                    2.970                  4.310   0.29  4.815
## Kevin De Bruyne               3.140                  3.210   0.29  4.815
##                 Touches.90
## p95                 65.645
## p5                  27.460
## Joshua Kimmich      65.645
## Joey Veerman        65.645
## Alex Baena          49.030
## Kevin De Bruyne     48.860

Radar representation

We use the fmsb library to create the radar chart. We define the function create_radarchart where we modify the radarchart function of the fmsb library that will allow us to work with the different arguments (colours, names of the axes, etc.):

Based on our scouting analysis, Manchester City has three viable approaches to replace Kevin De Bruyne:

  • Performance Priority – Joshua Kimmich

If the primary objective is to maintain elite-level midfield performance, Kimmich should be pursued. He guarantees immediate quality, control, and leadership in the midfield, though at a higher transfer cost and with potential negotiation complexity.

  • Stylistic Fit – Alex Baena

For a replacement closely aligned with De Bruyne’ tactical role and style, Baena represents a high-probability fit. He ensures continuity in the team’s possession game and could integrate seamlessly into the existing structure.

  • Market Opportunity – Joey Veerman

If the club prioritizes cost-effective acquisitions with growth potential, Veerman is the most attractive option. While slightly less experienced or complete than Kimmich or Baena, he balances quality and affordability, reducing financial risk.