Introduction

In this task, we will focus on Kevin De Bruyne remplacement at Manchester City. His position is “MF, FW”

Packages

library(tidyverse)
library(fmsb)
library(lsa)
library(knitr)
library(DT)

Data Loading

Function

select_players <- function(file, encoding, position, competition, primary, player){
  # 1. Read the file
  data <- read.csv(file, sep = ";", encoding = encoding)
  if (is.na(competition)) {
    cat("We consider all players", "\n")
    data_comp <- data
  } else {
    cat("We filter by competition", "\n")
    data_comp <- data %>%
      filter(Competition %in% c(competition))
  }
  if (primary){
    cat("We keep the players whose main position is:", position)
    data_players <- data_comp %>% 
      filter(substr(Pos, 1, 2) == position)
  } else {
    data_players <- data_players %>% 
      filter(grepl(position, Pos))
    cat("We keep the players whose position is:", position)
  }
  if (!is.na(player)){
    data_players <- data_players %>% filter(Player %in% c(player))
  }
  return (data_players)
}

In our case, we will work with players whose primary position is MF, as this is the position of our target player.

df_midfielders <- select_players(
  file = "FBREF_BigPlayers_2425.csv", 
  encoding = "UTF-8", 
  position = "MF", 
  competition = NA, 
  primary = TRUE,
  player = NA
)

## We consider all players 
## We keep the players whose main position is: MF

cat("\nNumber of players:", nrow(df_midfielders))

## 
## Number of players: 1245

What competitions are we working with?

unique(df_midfielders$Competition)

## [1] "Bundesliga"     "Eredivisie"     "La Liga"        "Ligue 1"       
## [5] "Premier League" "Primeira Liga"  "Serie A"

What positions do our players have?

unique(df_midfielders$Pos)

## [1] "MF,FW" "MF"    "MF,DF"

As we are looking for an offensive midfielder, we will eliminate players with MF,DF position as they will have a more defending profile than our player is.

df_midfielders <- df_midfielders %>%
  filter(Pos != 'MF,DF')

Data filtering

We define the filter_player function that will allow us, for example, to select those players who have played a minimum number of minutes. We could even, if we are analysing the top scorers of the moment, add a function so that we keep the players who average more than 1 goal per game. In this way what we do is to keep a smaller group of players on which to carry out the study.

This function will have the following parameters:

data: Data set read from a CSV.
metrics: List of metrics to be considered in the analysis.
pct_min_minutes: Minimum percentage of minutes of the players to be within the study (the sample).
age_max: Maximum age of the player to be considered in the sample.

filter_players <- function(
  data, metrics, pct_min_minutes, age_max){
  # We filter data and select metrics that define our sample data
  data_filter <- data %>%
    filter(Min > round((pct_min_minutes*90*MP_Squad) / 100), 
           Age <= age_max) %>%
    select(c("Player", "Squad", "Age", metrics))
  rownames(data_filter) <- 1:nrow(data_filter)
  return (data_filter)
}

In our example, we keep midfielders who have played at least 50% of the total minutes of his team and who are under 30 years of age.

list_metrics <- c("Gls.90", "Passes.",
                  "PassesCompleted.90", "PassesProgressive.90", 
                  "FinalThirdPasses.90", "LongPassesCompleted.90",
                  "Ast.90", "SCA.90", "Touches.90")

df_midfielders_filter <- filter_players(
  data = df_midfielders, 
  metrics = list_metrics, 
  pct_min_minutes = 50, 
  age_max = 30
)

Are there duplicated records?

cat("Duplicated players:", 
      df_midfielders_filter[
        duplicated(df_midfielders_filter$Player),]$Player)

## Duplicated players:

Rename some metrics:

# Rename metrics
df_midfielders_rename <- df_midfielders_filter %>%
  rename(`Goals by 90'` = `Gls.90`, 
         `Passes Completion %` = `Passes.`, 
         `Passes Completed by 90'` = `PassesCompleted.90`,
         `Final Third Passes by 90'` = `FinalThirdPasses.90`, 
         `Progressive Passes by 90'` = `PassesProgressive.90`, 
         `Long Passes completed by 90'` = `LongPassesCompleted.90`,
         `Assists by 90'` = `Ast.90`, 
         `Shot-Creating Actions by 90'` = `SCA.90`,
         `Touches by 90'` = `Touches.90`)

head(df_midfielders_rename)

##                  Player         Squad Age Goals by 90' Passes Completion %
## 1           Adrian Beck    Heidenheim  27         0.23                80.1
## 2 Alexis Claude-Maurice      Augsburg  26         0.38                82.9
## 3        András Schäfer  Union Berlin  25         0.06                73.5
## 4        Angelo Stiller     Stuttgart  23         0.03                87.4
## 5         Armin Gigovic Holstein Kiel  22         0.24                77.7
## 6      Arthur Vermeeren    RB Leipzig  19         0.00                86.1
##   Passes Completed by 90' Progressive Passes by 90' Final Third Passes by 90'
## 1                   18.16                      2.16                      1.66
## 2                   19.39                      2.18                      1.36
## 3                   15.92                      2.12                      1.15
## 4                   69.34                      9.03                      8.47
## 5                   16.90                      1.61                      1.42
## 6                   29.32                      2.75                      2.54
##   Long Passes completed by 90' Assists by 90' Shot-Creating Actions by 90'
## 1                         1.72           0.06                         3.16
## 2                         1.11           0.09                         3.29
## 3                         0.50           0.06                         1.70
## 4                         5.31           0.26                         3.64
## 5                         0.87           0.05                         1.85
## 6                         1.07           0.11                         1.60
##   Touches by 90'
## 1          30.06
## 2          35.50
## 3          30.73
## 4          86.91
## 5          30.16
## 6          39.61

Scoring calculation

Once we have selected our study sample, we calculate a value that summarises the performance of these players. This will be our rating.

To do this, as we are working with different metrics measured in different magnitudes, the first step is to normalise the variables, so all variables will be in the same range. Subsequently, we can assign weights and calculate the final score.

Data transformation

We use the MinMax transformer to normalise the values of the performance variables. For that, we define a normalize function that contains the definition of this transformer:

normalize <- function(x, na.rm=TRUE){
  return((x-min(x))/(max(x)-min(x)))
}

We apply this function to each of the columns (from column 4 onwards) of the dataframe df_midfielders_clean.

df_midfielders_norm <- data.frame(df_midfielders_rename)
for (i in 4:length(df_midfielders_rename)){
  df_midfielders_norm[,i] <- normalize(df_midfielders_rename[,i])
}

# For negative metrics (e.g.: Fls) we reverse the order
# df_midfielders_norm[,'Fouls per 90'] <-
# 1-df_midfielders_norm[,c('Fouls per 90')]

summary(df_midfielders_norm)

##     Player             Squad                Age         Goals.by.90.    
##  Length:325         Length:325         Min.   : 0.00   Min.   :0.00000  
##  Class :character   Class :character   1st Qu.:22.00   1st Qu.:0.05128  
##  Mode  :character   Mode  :character   Median :24.00   Median :0.12821  
##                                        Mean   :24.51   Mean   :0.17349  
##                                        3rd Qu.:27.00   3rd Qu.:0.25641  
##                                        Max.   :30.00   Max.   :1.00000  
##  Passes.Completion.. Passes.Completed.by.90. Progressive.Passes.by.90.
##  Min.   :0.0000      Min.   :0.0000          Min.   :0.0000           
##  1st Qu.:0.8271      1st Qu.:0.2209          1st Qu.:0.2498           
##  Median :0.8754      Median :0.2921          Median :0.3273           
##  Mean   :0.8616      Mean   :0.3010          Mean   :0.3475           
##  3rd Qu.:0.9108      3rd Qu.:0.3557          3rd Qu.:0.4166           
##  Max.   :1.0000      Max.   :1.0000          Max.   :1.0000           
##  Final.Third.Passes.by.90. Long.Passes.completed.by.90. Assists.by.90.   
##  Min.   :0.0000            Min.   :0.0000               Min.   :0.00000  
##  1st Qu.:0.1533            1st Qu.:0.1324               1st Qu.:0.07143  
##  Median :0.2130            Median :0.2159               Median :0.16071  
##  Mean   :0.2303            Mean   :0.2356               Mean   :0.19813  
##  3rd Qu.:0.2840            3rd Qu.:0.2975               3rd Qu.:0.28571  
##  Max.   :1.0000            Max.   :1.0000               Max.   :1.00000  
##  Shot.Creating.Actions.by.90. Touches.by.90.  
##  Min.   :0.0000               Min.   :0.0000  
##  1st Qu.:0.2846               1st Qu.:0.3111  
##  Median :0.3738               Median :0.3783  
##  Mean   :0.4132               Mean   :0.3849  
##  3rd Qu.:0.5169               3rd Qu.:0.4363  
##  Max.   :1.0000               Max.   :1.0000

We see that the numerical variables (the performance metrics) all have a minimum value of 0 and a maximum value of 1; they are normalised.

Scoring calculation

Once the variables have been normalised, we calculate the final scoring for each player.

We define the function calc_scoring that receives as parameters:

data: Data set transformed using MinMax.
weights: List of weights associated to each variable.
ind_metric: Index of the column where the performance metrics start.
columns_return: indicate which columns we want to return in the final result.
n: number of players to return (according to calculated scoring)

calc_scoring <- function(
  data, weights, ind_metric, columns_return, n){
  # We transform each variable: transformed value * associated weight
  for (i in ind_metric:ncol(data)){
    data[, i] <- data[,i]*weights[i-(ind_metric-1)]
  }
  cat("Weights sum:", sum(weights))
  # We calculate the scoring (sum of each complete record)
  data$`Final Score` <- rowSums(
    data[, c(ind_metric:ncol(data))])
  data$`Final Score` <- round(10*data$`Final Score`, 3)
  # Order df by score
  data <- data[order(-data$`Final Score`), 
               c(columns_return, "Final Score")]
  rownames(data) <- 1:nrow(data)
  return(data[1:n,])
}

We test our example. To do so, we assign weights to each of the performance metrics, with the only requirement that the sum must be 1.

colnames(df_midfielders_norm) # metrics

##  [1] "Player"                       "Squad"                       
##  [3] "Age"                          "Goals.by.90."                
##  [5] "Passes.Completion.."          "Passes.Completed.by.90."     
##  [7] "Progressive.Passes.by.90."    "Final.Third.Passes.by.90."   
##  [9] "Long.Passes.completed.by.90." "Assists.by.90."              
## [11] "Shot.Creating.Actions.by.90." "Touches.by.90."

df_score_midfielders <- calc_scoring(
  data = df_midfielders_norm, 
  weights = c(0.05, 0.05, 0.1, 0.15, 0.2, 0.05, 0.15, 
              0.2, 0.05), 
  ind_metric = 4, # id performance metric 
  columns_return = c("Player", "Squad", "Age"), 
  n = 10
)

## Weights sum: 1

df_score_midfielders

##             Player          Squad Age Final Score
## 1   Joshua Kimmich  Bayern Munich  29       8.175
## 2     Joey Veerman  PSV Eindhoven  25       7.374
## 3            Pedri      Barcelona  21       6.562
## 4  Bruno Fernandes Manchester Utd  29       6.360
## 5      Orkun Kökçü        Benfica  23       6.276
## 6   Angelo Stiller      Stuttgart  23       6.145
## 7  Pierre Højbjerg      Marseille  28       5.847
## 8    Florian Wirtz     Leverkusen  21       5.724
## 9          Vitinha            PSG  24       5.707
## 10 Martin Ødegaard        Arsenal  25       5.453

Based on the weighted metrics, Joshua Kimmich is the top candidate, combining passing, tempo control, and experience — and represents a realistic signing option for Manchester City.
Joey Veerman emerges as a clear market opportunity: he fits the profile, performs strongly, and could be signed at a reasonable cost.
While Pedri, and Bruno Fernandes also score highly, their clubs make them virtually unattainable.
Orkun Kokcu and Angelo Stiller could be good options too.
Pierre Hojberg could bring experience.
Florian Wirtz it’s the young bet, on a player who already have the interest of many big european clubs, the cost could be very high : it’s not necessarly a problem for Manchester City.
Vitinha still unattainable like Pedri and Bruno Fernandes.
Martin Odegaard could be a difficult player to move away from Arsenal.

Similarity algorithm

From our analysis, Joshua Kimmich stands out as the most complete midfielder in terms of the weighted metrics. But how closely does he resemble Kevin De Bruyne in style and role? And what about other ones ?

For this analysis, we need access to Kevin De Bruyne’ performance data

data_kdb <- select_players(
  file = "FBREF_BigPlayers_2425.csv", 
  encoding = "UTF-8", 
  position = "MF", 
  competition = NA, 
  primary = TRUE,
  player = c("Kevin De Bruyne")
)

## We consider all players 
## We keep the players whose main position is: MF

# Filter by metrics
data_player <- data_kdb[,c("Player", list_metrics)]

# Append dfs
data_final <- rbind(
  df_midfielders_filter %>% select(-c("Squad", "Age")),
  data_player
)

We develop the function similarity_tool that calculates the N players most similar to the player indicated in the player argument. The rest of the arguments will be:

sample: Sample of players.
player: Player to find similarities.
distance: Type of distance (Euclidean or cosine).
n: Number of similar players.

similarity_tool <- function(
  sample, player, metrics, metrics_rename, distance, n
  ){
  
  # We define seed for reproducibility
  set.seed(123)

  # Scale data
  data_final_norm <- scale(data_final %>% select(-Player))
  rownames(data_final_norm) <- data_final$Player
  
  # Distances
  if (distance == 'cosine'){
    ## Cosine distance
    ## Transpose data
    players_df <- t(data_final_norm)
    ## Cosine similarity (lsa package)
    sim_cosine <- cosine(players_df)
    
    ## Access to the column player
    player_sim <- sim_cosine[, player]
    
    ## Convert distances to percentages
    ## Normalize data - MinMax [0,1] scale
    df_sim <- as.data.frame(player_sim)
    colnames(df_sim) <- "Similarity"
    df_sim$Similarity <- normalize(df_sim$Similarity)
    ## Multiply by 100 to obtain a value inside the interval [0,100]
    df_sim$Similarity <- round(100*df_sim$Similarity, 3)

    ## Order by similarity and prepare output
    df_sim$Player <- data_final$Player
    final_df <- df_sim[order(-df_sim$Similarity),]
    # Drop player
    final_df <- final_df %>% filter(Player != player)
    rownames(final_df) <- 1:nrow(final_df)
    final_df <- final_df[1:n, c("Player", "Similarity")]
  }
  
  else {
    players_df <- data_final_norm
    ## Euclidean distance: dist(method='euclidean')
    mat_dist <- as.matrix(dist(x = players_df, method = "euclidean"))
    ## We keep player column and his distances between all players
    player_sim <- mat_dist[, player]
    df_sim <- as.data.frame(player_sim)
    colnames(df_sim) <- "Distance"
    df_sim$Player <- data_final$Player
    ## Drop player whose distance between himself is 0
    df_sim <- df_sim[df_sim$Player != player,]
    
    ## Convert distances into percentages
    d95 <- quantile(df_sim$Distance, 0.95) ## p95
    df_sim$Similarity <- (1 - (df_sim$Distance / d95))*100
    
    ## Order dataframe and prepare output data
    final_df <- df_sim[order(-df_sim$Similarity),]
    rownames(final_df) <- 1:nrow(final_df)
    final_df <- final_df[1:n, c("Player", "Similarity")]
  }
  
  # Union with the original data to access to real values of each metric
  # drop duplicates data
  data_clean <- sample %>%
    select(Player, metrics) %>%
    group_by(Player) %>%
    summarise_all("mean") %>% # average all columns
    rename_at(vars(metrics), ~ metrics_rename) # rename
  
  final_df <- merge(
    x = final_df, y = data_clean, 
    by = "Player", all.x = TRUE)
  final_df <- final_df[(order(-final_df$Similarity)), ]
  rownames(final_df) <- 1:n

  return(final_df)
}

Looking for players most similar to Kevin De Bruyne

Using the dataset already filtered in the previous exercise, we use the above function to find out which players performed most similarly to the Manchester City’s midfielder.

metrics_rename <- c("Goals/90", 
                    "Passes%", "PassesCompleted/90",
                    "PassesProgressive/90", 
                    "PassesUT/90", "LongPassesCompleted/90", 
                    "Ast/90", "SCA/90", "Touches/90")

sim_DeBruyne_EUCL <- similarity_tool(
  sample = data_final, 
  player = "Kevin De Bruyne",
  metrics = list_metrics,
  metrics_rename = metrics_rename,
  distance = "euclidean",
  n = 10
)

sim_DeBruyne_EUCL

##                Player Similarity Goals/90 Passes% PassesCompleted/90
## 1          Alex Baena   75.91482     0.24    66.6              26.78
## 2   Francisco Trincão   72.51973     0.28    78.5              24.91
## 3       Julian Brandt   70.18377     0.20    78.8              31.77
## 4  Mohammed Ihattaren   67.53767     0.23    71.6              25.21
## 5         Nicolás Paz   66.67454     0.20    80.6              28.34
## 6        Ludovic Blas   66.56148     0.24    73.9              23.90
## 7        Fabio Vieira   65.35014     0.20    79.2              29.69
## 8      James Maddison   63.35323     0.45    81.3              32.94
## 9        Sven Mijnans   62.78953     0.24    76.3              31.46
## 10      Gaëtan Perrin   62.34229     0.33    70.6              21.41
##    PassesProgressive/90 PassesUT/90 LongPassesCompleted/90 Ast/90 SCA/90
## 1                  5.91        2.97                   4.31   0.31   5.86
## 2                  4.53        2.82                   2.06   0.43   4.94
## 3                  4.97        3.13                   2.23   0.39   3.99
## 4                  4.17        2.75                   2.83   0.23   5.08
## 5                  4.66        3.31                   1.26   0.27   4.82
## 6                  4.00        2.62                   2.79   0.31   4.20
## 7                  4.77        3.19                   3.85   0.20   4.74
## 8                  5.29        3.97                   2.42   0.35   4.73
## 9                  4.61        2.57                   3.39   0.20   4.33
## 10                 4.18        2.56                   3.32   0.37   4.18
##    Touches/90
## 1       49.03
## 2       44.82
## 3       48.50
## 4       43.67
## 5       48.09
## 6       43.52
## 7       45.00
## 8       47.23
## 9       51.93
## 10      39.03

Based on the similarity algorithm, Alex Baena emerges as the player most similar to Kevin De Bruyne in terms of style and key performance metrics.

If we analyse the cosine distance:

sim_DeBruyne_COS <- similarity_tool(
  sample = data_final, 
  player = "Kevin De Bruyne",
  metrics = list_metrics,
  metrics_rename = metrics_rename,
  distance = "cosine",
  n = 10
)

sim_DeBruyne_COS

##                Player Similarity Goals/90 Passes% PassesCompleted/90
## 1          Alex Baena     96.514     0.24    66.6              26.78
## 2        Fabio Vieira     95.629     0.20    79.2              29.69
## 3   Francisco Trincão     95.230     0.28    78.5              24.91
## 4                Alan     95.108     0.25    80.4              32.10
## 5        Sven Mijnans     95.003     0.24    76.3              31.46
## 6  Mohammed Ihattaren     94.674     0.23    71.6              25.21
## 7       Florian Wirtz     94.600     0.38    78.3              44.19
## 8       Julian Brandt     94.589     0.20    78.8              31.77
## 9  Morgan Gibbs-White     94.566     0.22    78.3              29.21
## 10        Nicolás Paz     94.304     0.20    80.6              28.34
##    PassesProgressive/90 PassesUT/90 LongPassesCompleted/90 Ast/90 SCA/90
## 1                  5.91        2.97                   4.31   0.31   5.86
## 2                  4.77        3.19                   3.85   0.20   4.74
## 3                  4.53        2.82                   2.06   0.43   4.94
## 4                  5.29        3.52                   2.39   0.21   4.19
## 5                  4.61        2.57                   3.39   0.20   4.33
## 6                  4.17        2.75                   2.83   0.23   5.08
## 7                  5.68        3.10                   2.23   0.46   5.66
## 8                  4.97        3.13                   2.23   0.39   3.99
## 9                  4.82        3.76                   2.15   0.26   3.69
## 10                 4.66        3.31                   1.26   0.27   4.82
##    Touches/90
## 1       49.03
## 2       45.00
## 3       44.82
## 4       47.00
## 5       51.93
## 6       43.67
## 7       67.61
## 8       48.50
## 9       47.82
## 10      48.09

Radar chart

As a final step, we decided to highlight the most interesting players and project them on a graph as potential replacements for Kevin De Bruyne :

Alex Baena – the player most similar to De Bruyne in style and key metrics, confirming him as a strong tactical fit.
Joshua Kimmich – the top performer overall, combining elite passing, tempo control, and experience, making him the optimal replacement from a performance standpoint.
Joey Veerman – a market opportunity, offering a promising profile at a reasonable cost, and a realistic option for acquisition.

Boundary construction

We calculate the p5 and p95 for each of the analysis metrics.

We create the dataframe min_max_df that will contain the p5 and p95 for each of the study metrics.

min_max_df <- rbind(
  apply(data_final[, list_metrics], 2, 
        function(x) quantile(x, probs=.95)), 
  apply(data_final[, list_metrics], 2, 
        function(x) quantile(x, probs=.05)))
rownames(min_max_df) <- c("p95", "p5")

min_max_df

##     Gls.90 Passes. PassesCompleted.90 PassesProgressive.90 FinalThirdPasses.90
## p95 0.3875  89.600            48.7950                6.725               5.565
## p5  0.0000  68.725            13.9075                1.680               1.060
##     LongPassesCompleted.90 Ast.90 SCA.90 Touches.90
## p95                  5.015   0.29  4.815     65.645
## p5                   0.665   0.00  1.200     27.460

Preparation of the dataframe

players_radar <- c("Kevin De Bruyne", "Joshua Kimmich", "Alex Baena", "Joey Veerman")

# Filter data by players
df_midfielders_radar <- data_final[
  data_final$Player %in% players_radar, ]

# Check if value is inside interval [p5,p95]
index_metric <- 2
for (p in players_radar){
  df_p <- df_midfielders_radar[df_midfielders_radar$Player == p,]
  for (c in colnames(df_midfielders_radar)[index_metric:ncol(df_midfielders_radar)])
    {
    value_c <- df_p[, c]
    if (value_c < min_max_df["p5", c]){
      df_midfielders_radar[
        df_midfielders_radar$Player == p, c] = min_max_df["p5", c]
    } else {
      if (value_c > min_max_df["p95", c]){
        df_midfielders_radar[
          df_midfielders_radar$Player == p, c] = min_max_df["p95", c]
      }
    }
  }
}

# Crate df union all data
df_midfielders_radar <- as.data.frame(df_midfielders_radar)
rownames(df_midfielders_radar) <- df_midfielders_radar$Player # update !!
df_final_plot <- rbind(
  min_max_df, df_midfielders_radar[, list_metrics])

df_final_plot

##                 Gls.90 Passes. PassesCompleted.90 PassesProgressive.90
## p95             0.3875  89.600            48.7950                6.725
## p5              0.0000  68.725            13.9075                1.680
## Joshua Kimmich  0.0900  89.500            48.7950                6.725
## Joey Veerman    0.0500  80.900            48.7950                6.725
## Alex Baena      0.2400  68.725            26.7800                5.910
## Kevin De Bruyne 0.2100  75.900            32.4300                5.610
##                 FinalThirdPasses.90 LongPassesCompleted.90 Ast.90 SCA.90
## p95                           5.565                  5.015   0.29  4.815
## p5                            1.060                  0.665   0.00  1.200
## Joshua Kimmich                5.565                  5.015   0.22  4.815
## Joey Veerman                  5.565                  5.015   0.29  4.815
## Alex Baena                    2.970                  4.310   0.29  4.815
## Kevin De Bruyne               3.140                  3.210   0.29  4.815
##                 Touches.90
## p95                 65.645
## p5                  27.460
## Joshua Kimmich      65.645
## Joey Veerman        65.645
## Alex Baena          49.030
## Kevin De Bruyne     48.860

Radar representation

We use the fmsb library to create the radar chart. We define the function create_radarchart where we modify the radarchart function of the fmsb library that will allow us to work with the different arguments (colours, names of the axes, etc.):

create_radarchart <- function(data, color = color, 
                              vlabels = colnames(data), vlcex = 0.7,
                              caxislabels = NULL, title = NULL){
  radarchart(
    data, axistype = 1,
    # Polygon
    pcol = color, pfcol = scales::alpha(color, 0.5), 
    plwd = 2, plty = 1,
    cglcol = "grey", cglty = 1, cglwd = 0.8,
    # Axis
    axislabcol = "white", 
    # Labels
    vlcex = vlcex, vlabels = vlabels,
    caxislabels = caxislabels, title = title,
  )
}

# Metric name in the radar plot
metrics_name_plot <- c("Goals/90", 
                       "Passes%", "PassesCompleted/90", "PassesProgressive/90", 
                       "PassesUT/90", "LongPassesCompleted/90", 
                       "Ast/90", "SCA/90", "Touches/90")

# Colors for each player
colors_radar <- c("#00AFBB", "#f7d62d", "#8DBF8D", "#FFA500")

# Plot
op <- par(mar = c(1, 2, 2, 2))
create_radarchart(
  data = df_final_plot, 
  color = colors_radar,
  vlabels = metrics_name_plot
)
legend("bottomleft", # position of legend
       legend = rownames(df_final_plot[-c(1,2),]), # name players
       horiz = FALSE, # position of legend
       bty = 'n', pch = 20, 
       col = colors_radar, 
       text.col = "black", cex = 0.65, pt.cex = 1.75)
title(
  main = "De Bruyne' replacement \nMidfielders, 24/25", 
  cex.main = 1, col.main = "#5D6D7E")

Based on our scouting analysis, Manchester City has three viable approaches to replace Kevin De Bruyne:

Performance Priority – Joshua Kimmich

If the primary objective is to maintain elite-level midfield performance, Kimmich should be pursued. He guarantees immediate quality, control, and leadership in the midfield, though at a higher transfer cost and with potential negotiation complexity.

Stylistic Fit – Alex Baena

For a replacement closely aligned with De Bruyne’ tactical role and style, Baena represents a high-probability fit. He ensures continuity in the team’s possession game and could integrate seamlessly into the existing structure.

Market Opportunity – Joey Veerman

If the club prioritizes cost-effective acquisitions with growth potential, Veerman is the most attractive option. While slightly less experienced or complete than Kimmich or Baena, he balances quality and affordability, reducing financial risk.

Individual Task - Kevin De Bruyne remplacement

Pierre Anorga

2026-04-22

Introduction

Packages

Data Loading

Data filtering

Are there duplicated records?

Scoring calculation

Data transformation

Scoring calculation

Similarity algorithm

Looking for players most similar to Kevin De Bruyne

Radar chart

Boundary construction

Preparation of the dataframe

Radar representation