The aim of my scouting task is to develop a complete end-to-end scouting process, applying concepts covered in the module and putting them into practice through a real-world case study.
My objective is to identify a suitable replacement - a younger forward who will replace FC Barcelona’s Robert Lewandowski using performance data from the 2024/25 season.
I will aim to identify potential candidates and draw a conclusion about who could be the top candidate replacing him.
My scouting analysis will be based on the following analytical approaches:
All file paths are specified relative to the project directory to ensure reproducibility.
As next step I will import the libraries necessary to perform my
scouting analysis. For those I do not have downloaded I use the
commmand:install.packages({library}).
As next I willread our CSV file I am going to work with and select the sample (context) of analysis.
Function
select_players <- function(file, encoding, position, competition, primary, player){
# 1. Read the file
data <- read.csv(file, sep = ";", encoding = encoding)
if (is.na(competition)) {
cat("We consider all players", "\n")
data_comp <- data
} else {
cat("We filter by competition", "\n")
data_comp <- data %>%
filter(Competition %in% c(competition))
}
if (primary){
cat("We keep the players whose main position is:", position)
data_players <- data_comp %>%
filter(substr(Pos, 1, 2) == position)
} else {
data_players <- data_players %>%
filter(grepl(position, Pos))
cat("We keep the players whose position is:", position)
}
if (!is.na(player)){
data_players <- data_players %>% filter(Player %in% c(player))
}
return (data_players)
}Next I will use a function and focus on players whose primary
position is FW (forwards) - the position
of my objective player: Robert Lewandowski. I will consider all
competitions.
df_forwards <- select_players(
file = "data/FBREF_BigPlayers_2425.csv",
encoding = "UTF-8",
position = "FW",
competition = NA,
primary = TRUE,
player = NA
)## We consider all players
## We keep the players whose main position is: FW
##
## Number of players: 1055
Competitions included
## [1] "Bundesliga" "Eredivisie" "La Liga" "Ligue 1"
## [5] "Premier League" "Primeira Liga" "Serie A"
I will refine my search and narrow it down even further and focus only on pure forwards excluding wingers as Lewandowski is a typical striker and my objective is to find players of the same typology.
df_forwards <- df_forwards %>%
filter(Pos == "FW")
cat("Number of pure forwards:", nrow(df_forwards))## Number of pure forwards: 542
I will have to narrow down the sample further including filters.
I will define the filter_player function that will allow
me to focus on those players who have played a minimum number of minutes
required for our candidate. I will also narrow down my search by age as
I am looking for a younger player than RL, the expectation is that he
can play several seasons for my club.
My function will have the following parameters:
data: Data set read from a CSV.metrics: List of metrics to be considered in the
analysis.pct_min_minutes: Minimum percentage of minutes played
to be within the sample.age_max: Maximum age of the player to be considered in
the sample.filter_players <- function(
data, metrics, pct_min_minutes, age_max){
# We filter data and select metrics that define our sample data
data_filter <- data %>%
filter(Min > round((pct_min_minutes*90*MP_Squad) / 100),
Age <= age_max) %>%
select(c("Player", "Squad", "Age", metrics))
rownames(data_filter) <- 1:nrow(data_filter)
return (data_filter)
}Players included in my sample are forwards who played at least 50% of the total minutes of their team and who are under 27 years of age. I have selected this subset of metrics because they reflect a complete modern central striker model I am looking for.
list_metrics <- c("G.PK.90", "xG.90", "G.xG", "Sh.90", "SoT.90",
"SCA.90", "GCA.90", "KP.90", "Fld.90","FinalThirdPasses.90",
"PassesProgressive.90")
df_forwards_filter <- filter_players(
data = df_forwards,
metrics = list_metrics,
pct_min_minutes = 50,
age_max = 27
)## Duplicated players:
To keep things tidy I will rename some metrics.
# Rename metrics
df_forwards_rename <- df_forwards_filter %>%
rename(`Non-Penalty Goals by 90'` = `G.PK.90`,
`Expected Goals (xG) by 90'` = `xG.90`,
`Goals minus xG` = `G.xG`,
`Shots by 90'` = `Sh.90`,
`Shots on Target by 90'` = `SoT.90`,
`Shot-Creating Actions by 90'` = `SCA.90`,
`Goal-Creating Actions by 90'` = `GCA.90`,
`Key Passes by 90'` = `KP.90`,
`Fouls Drawn by 90'` = `Fld.90`,
`Final Third Passes by 90'` = `FinalThirdPasses.90`,
`Progressive Passes by 90'` = `PassesProgressive.90`)
head(df_forwards_rename)## Player Squad Age Non-Penalty Goals by 90'
## 1 Ermedin Demirović Stuttgart 26 0.73
## 2 Hugo Ekitike Eint Frankfurt 22 0.49
## 3 Johannes Eggestein St. Pauli 26 0.10
## 4 Jonathan Burkardt Mainz 05 24 0.68
## 5 Junior Adamu Freiburg 23 0.12
## 6 Mohamed Amoura Wolfsburg 24 0.29
## Expected Goals (xG) by 90' Goals minus xG Shots by 90' Shots on Target by 90'
## 1 0.69 0.9 3.11 1.31
## 2 0.76 -6.6 4.00 1.55
## 3 0.23 -1.7 1.65 0.49
## 4 0.63 3.2 2.99 1.19
## 5 0.32 -3.5 2.10 0.64
## 6 0.32 1.3 2.66 0.91
## Shot-Creating Actions by 90' Goal-Creating Actions by 90' Key Passes by 90'
## 1 2.28 0.34 0.53
## 2 3.55 0.42 1.33
## 3 2.09 0.19 0.89
## 4 2.31 0.34 0.69
## 5 1.69 0.17 0.68
## 6 2.92 0.47 1.13
## Fouls Drawn by 90' Final Third Passes by 90' Progressive Passes by 90'
## 1 0.79 0.44 0.91
## 2 0.52 0.73 1.61
## 3 0.44 0.93 1.59
## 4 0.83 0.86 1.45
## 5 0.88 0.28 0.56
## 6 0.65 0.94 1.97
For the selected sample, I will calculate a value that summarises the performance of these players. This will serve as my rating.
Since I am working with different metrics measured in different magnitudes, the first step is to normalise the variables so that all metrics will be in the same scale. After I assign different weights and calculate the final score.
I will use the MinMax transformer to normalise the values of the
performance variables. For that, we define a normalize
function:
# Normalization function
normalize <- function(x, na.rm=TRUE){
return((x-min(x))/(max(x)-min(x)))
}
# Normalize numeric columns (from 4 onwards)
df_forwards_norm <- data.frame(df_forwards_rename)
for (i in 4:ncol(df_forwards_rename)){
df_forwards_norm[, i] <- normalize(df_forwards_rename[, i])
}
summary(df_forwards_norm)## Player Squad Age Non.Penalty.Goals.by.90.
## Length:102 Length:102 Min. :17.0 Min. :0.0000
## Class :character Class :character 1st Qu.:22.0 1st Qu.:0.2139
## Mode :character Mode :character Median :24.0 Median :0.3221
## Mean :23.6 Mean :0.3453
## 3rd Qu.:25.0 3rd Qu.:0.4519
## Max. :27.0 Max. :1.0000
## Expected.Goals..xG..by.90. Goals.minus.xG Shots.by.90.
## Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.1932 1st Qu.:0.3463 1st Qu.:0.2590
## Median :0.2614 Median :0.4797 Median :0.3500
## Mean :0.3155 Mean :0.4768 Mean :0.3814
## 3rd Qu.:0.4205 3rd Qu.:0.5980 3rd Qu.:0.5013
## Max. :1.0000 Max. :1.0000 Max. :1.0000
## Shots.on.Target.by.90. Shot.Creating.Actions.by.90.
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.1878 1st Qu.:0.1487
## Median :0.2978 Median :0.2635
## Mean :0.3096 Mean :0.3024
## 3rd Qu.:0.3956 3rd Qu.:0.4330
## Max. :1.0000 Max. :1.0000
## Goal.Creating.Actions.by.90. Key.Passes.by.90. Fouls.Drawn.by.90.
## Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.1565 1st Qu.:0.1595 1st Qu.:0.1548
## Median :0.2261 Median :0.2543 Median :0.2429
## Mean :0.2713 Mean :0.2994 Mean :0.2827
## 3rd Qu.:0.3652 3rd Qu.:0.3966 3rd Qu.:0.3897
## Max. :1.0000 Max. :1.0000 Max. :1.0000
## Final.Third.Passes.by.90. Progressive.Passes.by.90.
## Min. :0.0000 Min. :0.00000
## 1st Qu.:0.1250 1st Qu.:0.09611
## Median :0.1961 Median :0.20302
## Mean :0.2398 Mean :0.26028
## 3rd Qu.:0.2876 3rd Qu.:0.36987
## Max. :1.0000 Max. :1.00000
Following the normalisation of data sets, I will calculate the final scoring for each player.
I define the function calc_scoring that receives as
parameters:
data: Data set transformed using MinMax.weights: List of weights associated to each
variable.ind_metric: Index of the column where the performance
metrics start.columns_return: indicate which columns we want to
return in the final result.n: number of players to return (according to calculated
scoring)calc_scoring <- function(
data, weights, ind_metric, columns_return, n){
# Transform each variable: transformed value * associated weight
for (i in ind_metric:ncol(data)){
data[, i] <- data[,i]*weights[i-(ind_metric-1)]
}
cat("Weights sum:", sum(weights))
# Calculating the scoring (sum of each complete record)
data$`Final Score` <- rowSums(
data[, c(ind_metric:ncol(data))])
data$`Final Score` <- round(10*data$`Final Score`, 3)
# Ordering df by score
data <- data[order(-data$`Final Score`),
c(columns_return, "Final Score")]
rownames(data) <- 1:nrow(data)
return(data[1:n,])
}After I assign weights to each of the performance metrics, with the only requirement that the sum must be 1. I will run this chunk to verify the order first:
## [1] "Player" "Squad"
## [3] "Age" "Non.Penalty.Goals.by.90."
## [5] "Expected.Goals..xG..by.90." "Goals.minus.xG"
## [7] "Shots.by.90." "Shots.on.Target.by.90."
## [9] "Shot.Creating.Actions.by.90." "Goal.Creating.Actions.by.90."
## [11] "Key.Passes.by.90." "Fouls.Drawn.by.90."
## [13] "Final.Third.Passes.by.90." "Progressive.Passes.by.90."
Now I use this weight vector (where the sum of values values must = 1).
weights_fw <- c(
0.20, # Non-Penalty Goals by 90'
0.18, # Expected Goals (xG) by 90'
0.14, # Goals minus xG
0.09, # Shots by 90'
0.09, # Shots on Target by 90'
0.08, # Shot-Creating Actions by 90'
0.07, # Goal-Creating Actions by 90'
0.05, # Key Passes by 90'
0.03, # Fouls Drawn by 90'
0.04, # Final Third Passes by 90'
0.03 # Progressive Passes by 90'
)df_score_forwards <- calc_scoring(
data = df_forwards_norm,
weights = weights_fw,
ind_metric = 4,
columns_return = c("Player", "Squad", "Age"),
n = 10
)## Weights sum: 1
df_score_forwards sum(weights_fw)
Ousmane Dembélé is an elite metric performer, but he’s an expensive, non–like-for-like “9” solution and a former Barça player — making him a strategically unlikely successor. Transfermarkt lists him at €100m.
Viktor Gyökeres looks like the strongest “true striker” replacement on performance metrics (volume + output), but he is also a premium-priced asset, it is a big investment but can deliver immediate impact. Transfermarkt lists him at €70m.
Kylian Mbappé is effectively ruled out: direct rival (Real Madrid) + extreme cost. Transfermarkt lists him at €200m.
Lamine Yamal shows up as a “successor signal” because he’s already elite, but he’s not a central striker and is a long-term internal pillar rather than a Lewandowski-style 9. Transfermarkt lists him at €200m and as a Right Winger.
Vangelis Pavlidis is a strong value-style candidate: productive profile, true CF role, and comparatively more attainable pricing than the elite tier. Transfermarkt lists him at €35m. He is a clear market opportunity.
Mateo Retegui profiles as a solid “classic 9” option with strong striker indicators and an “accessible relative to superstars” price point. He could present a potential reasonable option. Transfermarkt lists him at €40m.
Summary:
# Metrics (same ones used for scoring)
list_metrics_fw <- c("G.PK.90", "xG.90", "G.xG", "Sh.90", "SoT.90",
"SCA.90", "GCA.90", "KP.90", "Fld.90",
"FinalThirdPasses.90", "PassesProgressive.90")
# Create the final sample for similarity (Player + metrics only)
data_final <- df_forwards_filter %>%
select(Player, all_of(list_metrics_fw)) %>%
group_by(Player) %>%
summarise(across(everything(), mean), .groups = "drop")(That group_by/summarise step avoids duplicates if a
player appears more than once.)
Before running the similarity tool chunk I will run a quick check.
## [1] "/Users/hoots/Desktop/MSc Football Data Analytics/TalentDetection"
## [1] "data"
## [2] "Talent_Detection_Lewandowski_Replacement.html"
## [3] "Talent_Detection_Lewandowski_Replacement.Rmd"
## [4] "TalentDetection.html"
## [5] "TalentDetection.Rmd"
## [1] "FBREF_BigClubes_2324.csv" "FBREF_BigClubes_2425.csv"
## [3] "FBREF_BigPlayers_2324.csv" "FBREF_BigPlayers_2425.csv"
I will now load the dataset into `data_players
data_players <- read.csv(
file = "data/FBREF_BigPlayers_2425.csv",
sep = ";",
encoding = "UTF-8"
)
# confirm it loaded
dim(data_players)## [1] 3972 72
Check for Lewandowski spelling.
## [1] "Jamie Leweling" "Lewis Holtby" "Lewis Schouten"
## [4] "Robert Lewandowski" "Dominic Calvert-Lewin" "Keane Lewis-Potter"
## [7] "Lewis Cook" "Lewis Dunk" "Lewis Hall"
## [10] "Lewis Miley" "Lewis Orford" "Myles Lewis-Skelly"
## [13] "Rico Lewis" "Lewis Ferguson"
I check if he is in my similarity sample.
## [1] FALSE
Since Lewandowski is not in my sample I pull his row from the full dataset and append it to my similarity sample.
# define my Metrics (same as in my scoring profile) and set the exact player name
list_metrics_fw <- c("G.PK.90", "xG.90", "G.xG", "Sh.90", "SoT.90",
"SCA.90", "GCA.90", "KP.90", "Fld.90",
"FinalThirdPasses.90", "PassesProgressive.90")
# Reference player name
player_name <- "Robert Lewandowski"(Build my “candidate pool” I want to compare against.)
# Build the comparison sample from my filtered forward dataset
data_final <- df_forwards_filter %>%
select(Player, all_of(list_metrics_fw)) %>%
group_by(Player) %>%
summarise(across(everything(), mean), .groups = "drop")
# Check if Lewandowski is already included
player_name %in% data_final$Player## [1] FALSE
Extract Lewandowski from the FULL dataset (unfiltered) - this grabs his metrics.
# Pull Lewandowski data from the full dataset (unfiltered)
lewa_row <- data_players %>%
filter(Player == player_name) %>%
select(Player, all_of(list_metrics_fw)) %>%
group_by(Player) %>%
summarise(across(everything(), mean), .groups = "drop")
lewa_row## # A tibble: 1 × 12
## Player G.PK.90 xG.90 G.xG Sh.90 SoT.90 SCA.90 GCA.90 KP.90 Fld.90
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Robert Lewandows… 0.81 0.92 -0.100 3.75 1.52 1.69 0.1 0.56 1.15
## # ℹ 2 more variables: FinalThirdPasses.90 <dbl>, PassesProgressive.90 <dbl>
# Append Lewandowski if he is missing from the comparison sample
if(!(player_name %in% data_final$Player)){
data_final <- bind_rows(data_final, lewa_row)
}
# I confirm that he is now included
player_name %in% data_final$Player## [1] TRUE
Next I create the function similarity_tool that
calculates the N players most similar to the player
indicated in the player argument.
similarity_tool <- function(
sample, player, metrics, metrics_rename, distance, n
){
# We define seed for reproducibility
set.seed(123)
# Scale data
data_final_norm <- scale(data_final %>% select(-Player))
rownames(data_final_norm) <- data_final$Player
# Distances
if (distance == 'cosine'){
## Cosine distance
## Transpose data
players_df <- t(data_final_norm)
## Cosine similarity (lsa package)
sim_cosine <- cosine(players_df)
## Access to the column player
player_sim <- sim_cosine[, player]
## Convert distances to percentages
## Normalize data - MinMax [0,1] scale
df_sim <- as.data.frame(player_sim)
colnames(df_sim) <- "Similarity"
df_sim$Similarity <- normalize(df_sim$Similarity)
## Multiply by 100 to obtain a value inside the interval [0,100]
df_sim$Similarity <- round(100*df_sim$Similarity, 3)
## Order by similarity and prepare output
df_sim$Player <- data_final$Player
final_df <- df_sim[order(-df_sim$Similarity),]
# Drop player
final_df <- final_df %>% filter(Player != player)
rownames(final_df) <- 1:nrow(final_df)
final_df <- final_df[1:n, c("Player", "Similarity")]
}
else {
players_df <- data_final_norm
## Euclidean distance: dist(method='euclidean')
mat_dist <- as.matrix(dist(x = players_df, method = "euclidean"))
## We keep player column and his distances between all players
player_sim <- mat_dist[, player]
df_sim <- as.data.frame(player_sim)
colnames(df_sim) <- "Distance"
df_sim$Player <- data_final$Player
## Drop player whose distance between himself is 0
df_sim <- df_sim[df_sim$Player != player,]
## Convert distances into percentages
d95 <- quantile(df_sim$Distance, 0.95) ## p95
df_sim$Similarity <- (1 - (df_sim$Distance / d95))*100
## Order dataframe and prepare output data
final_df <- df_sim[order(-df_sim$Similarity),]
rownames(final_df) <- 1:nrow(final_df)
final_df <- final_df[1:n, c("Player", "Similarity")]
}
# Union with the original data to access to real values of each metric
# drop duplicates data
data_clean <- sample %>%
select(Player, metrics) %>%
group_by(Player) %>%
summarise_all("mean") %>% # average all columns
rename_at(vars(metrics), ~ metrics_rename) # rename
final_df <- merge(
x = final_df, y = data_clean,
by = "Player", all.x = TRUE)
final_df <- final_df[(order(-final_df$Similarity)), ]
rownames(final_df) <- 1:n
return(final_df)
}Using the dataset already filtered previously, we use the following function to find out which players performed most similarly to the Barcelona’s forward.
metrics_rename_fw <- c("G-PK/90", "xG/90", "G-xG", "Sh/90", "SoT/90",
"SCA/90", "GCA/90", "KP/90", "Fld/90",
"FinalThirdPasses/90", "ProgPasses/90")
sim_Lewa_EUCL <- similarity_tool(
sample = data_final,
player = "Robert Lewandowski",
metrics = list_metrics_fw,
metrics_rename = metrics_rename_fw,
distance = "euclidean",
n = 10
)
sim_Lewa_EUCL## Player Similarity G-PK/90 xG/90 G-xG Sh/90 SoT/90 SCA/90 GCA/90
## 1 Moise Kean 68.39824 0.60 0.65 -0.4 3.43 1.63 1.90 0.23
## 2 Ermedin Demirović 68.16007 0.73 0.69 0.9 3.11 1.31 2.28 0.34
## 3 Jonathan Burkardt 62.48488 0.68 0.63 3.2 2.99 1.19 2.31 0.34
## 4 Erling Haaland 62.06110 0.62 0.72 0.0 3.42 1.81 2.34 0.30
## 5 Yoane Wissa 59.74726 0.59 0.57 0.5 2.77 1.26 2.13 0.31
## 6 Troy Parrott 57.25596 0.50 0.62 -0.9 2.73 1.24 2.73 0.25
## 7 Emanuel Emegha 56.11362 0.55 0.67 -3.0 2.39 1.37 1.26 0.16
## 8 Samu Omorodion 53.80679 0.60 0.56 4.9 3.10 1.15 1.75 0.36
## 9 Vangelis Pavlidis 52.76428 0.64 0.74 0.4 3.23 1.36 3.31 0.60
## 10 Mateo Retegui 50.87713 0.79 0.71 6.1 3.74 1.21 2.87 0.53
## KP/90 Fld/90 FinalThirdPasses/90 ProgPasses/90
## 1 0.59 1.62 0.47 0.81
## 2 0.53 0.79 0.44 0.91
## 3 0.69 0.83 0.86 1.45
## 4 0.94 0.42 0.32 0.65
## 5 0.77 1.46 1.26 1.91
## 6 1.11 1.11 0.75 1.14
## 7 0.67 1.37 0.15 0.52
## 8 0.57 1.07 0.53 0.93
## 9 1.18 0.76 0.65 1.65
## 10 1.06 0.94 0.81 1.28
If we analyse the cosine distance:
sim_Lewa_COS <- similarity_tool(
sample = data_final,
player = "Robert Lewandowski",
metrics = list_metrics_fw,
metrics_rename = metrics_rename_fw,
distance = "cosine",
n = 10
)
sim_Lewa_COS## Player Similarity G-PK/90 xG/90 G-xG Sh/90 SoT/90 SCA/90 GCA/90
## 1 Ermedin Demirović 92.846 0.73 0.69 0.9 3.11 1.31 2.28 0.34
## 2 Moise Kean 92.600 0.60 0.65 -0.4 3.43 1.63 1.90 0.23
## 3 Yoane Wissa 91.197 0.59 0.57 0.5 2.77 1.26 2.13 0.31
## 4 Jonathan Burkardt 90.451 0.68 0.63 3.2 2.99 1.19 2.31 0.34
## 5 Troy Parrott 89.392 0.50 0.62 -0.9 2.73 1.24 2.73 0.25
## 6 Erling Haaland 88.149 0.62 0.72 0.0 3.42 1.81 2.34 0.30
## 7 Emanuel Emegha 83.320 0.55 0.67 -3.0 2.39 1.37 1.26 0.16
## 8 Samu Omorodion 80.771 0.60 0.56 4.9 3.10 1.15 1.75 0.36
## 9 Mateo Retegui 80.187 0.79 0.71 6.1 3.74 1.21 2.87 0.53
## 10 Vangelis Pavlidis 79.854 0.64 0.74 0.4 3.23 1.36 3.31 0.60
## KP/90 Fld/90 FinalThirdPasses/90 ProgPasses/90
## 1 0.53 0.79 0.44 0.91
## 2 0.59 1.62 0.47 0.81
## 3 0.77 1.46 1.26 1.91
## 4 0.69 0.83 0.86 1.45
## 5 1.11 1.11 0.75 1.14
## 6 0.94 0.42 0.32 0.65
## 7 0.67 1.37 0.15 0.52
## 8 0.57 1.07 0.53 0.93
## 9 1.06 0.94 0.81 1.28
## 10 1.18 0.76 0.65 1.65
Observation - Based on the similarity algorithm, Moise Kean emerges as the player most similar to Robert Lewandowski in terms of style and key performance metrics. According to Barca Blaugranes and several recent articles in Spain and Italy there is a Gossip: Barcelona considering Moise Kean as Robert Lewandowski successor. It’s beautiful to see how data reinforces this.
As a final step, we identified the most compelling candidates and mapped them on a graph to evaluate and visualize potential successors for Robert Lewandowski.
Moise Kean – still young player - most similar to Lewa in style and key metrics, with Barcelona confirming interest in him already as a good fit and potential successor of the Polish striker.
Ermedin Demirović – the VFB Stuttgart forward has similar metrics with Moise Kean, he ranks very well in shots on target percentage and xG in Bundesliga, makes well-timed runs in behind and into the channels. He is an interesting prospect, physically robust centre-forward who has developed into a reliable scorer at Stuttgart. With a market value of around 20 mil euro he represents a very affordable option.
Jonathan Burkardt – Emerging Bundesliga scorer, consistent and capable of high involvement in build-up and finishing. Market value ranging between 15-20 mil euro. He offers a balanced and dynamic forward profile and could be a high-upside option if budget or competition for Kean is high.
I calculate the p5 and p95 for each of the analysis metrics.
I create the dataframe min_max_df that will contain the
p5 and p95 for each of the study metrics.
min_max_df <- rbind(
apply(data_final[, list_metrics], 2,
function(x) quantile(x, probs=.95)),
apply(data_final[, list_metrics], 2,
function(x) quantile(x, probs=.05)))
rownames(min_max_df) <- c("p95", "p5")
min_max_df## G.PK.90 xG.90 G.xG Sh.90 SoT.90 SCA.90 GCA.90 KP.90 Fld.90
## p95 0.725 0.738 4.99 3.959 1.568 5.039 0.718 1.839 2.318
## p5 0.111 0.150 -3.40 1.359 0.472 1.262 0.091 0.371 0.411
## FinalThirdPasses.90 PassesProgressive.90
## p95 2.201 3.746
## p5 0.324 0.551
Next we have to append on the dataframe min_max_df each
of the records to be drawn on the radar. What happens if there are
players whose actual value is above p95 or below
p5? In this case we will have to adjust the value by
replacing the actual value of the player or team by the maximum (p95) or
minimum (p5).
# Players to visualize
players_radar <- c("Robert Lewandowski", "Moise Kean", "Ermedin Demirović", "Jonathan Burkardt")
# Filter data by players
df_forwards_radar <- data_final[
data_final$Player %in% players_radar, ]
# Ensure values remain inside interval [p5, p95]
index_metric <- 2
for (p in players_radar){
df_p <- df_forwards_radar[df_forwards_radar$Player == p,]
for (c in colnames(df_forwards_radar)[index_metric:ncol(df_forwards_radar)])
{
value_c <- df_p[, c]
if (value_c < min_max_df["p5", c]){
df_forwards_radar[
df_forwards_radar$Player == p, c] = min_max_df["p5", c]
} else {
if (value_c > min_max_df["p95", c]){
df_forwards_radar[
df_forwards_radar$Player == p, c] = min_max_df["p95", c]
}
}
}
}
# Prepare final radar dataframe
df_forwards_radar <- as.data.frame(df_forwards_radar)
rownames(df_forwards_radar) <- df_forwards_radar$Player # update !!
df_final_plot <- rbind(
min_max_df, df_forwards_radar[, list_metrics])
df_final_plot## G.PK.90 xG.90 G.xG Sh.90 SoT.90 SCA.90 GCA.90 KP.90 Fld.90
## p95 0.725 0.738 4.99 3.959 1.568 5.039 0.718 1.839 2.318
## p5 0.111 0.150 -3.40 1.359 0.472 1.262 0.091 0.371 0.411
## Ermedin Demirović 0.725 0.690 0.90 3.110 1.310 2.280 0.340 0.530 0.790
## Jonathan Burkardt 0.680 0.630 3.20 2.990 1.190 2.310 0.340 0.690 0.830
## Moise Kean 0.600 0.650 -0.40 3.430 1.568 1.900 0.230 0.590 1.620
## Robert Lewandowski 0.725 0.738 -0.10 3.750 1.520 1.690 0.100 0.560 1.150
## FinalThirdPasses.90 PassesProgressive.90
## p95 2.201 3.746
## p5 0.324 0.551
## Ermedin Demirović 0.440 0.910
## Jonathan Burkardt 0.860 1.450
## Moise Kean 0.470 0.810
## Robert Lewandowski 0.910 1.410
We use the fmsb library to create the radar chart. We
define the function create_radarchart where we modify the
radarchart function of the fmsb library that
will allow us to work with the different arguments (colours, names of
the axes, etc.):
# Step 1 — create the radar helper function
create_radarchart <- function(data, color = color,
vlabels = colnames(data), vlcex = 0.7,
caxislabels = NULL, title = NULL){
fmsb::radarchart(
data, axistype = 1,
pcol = color, pfcol = scales::alpha(color, 0.5),
plwd = 2, plty = 1,
cglcol = "grey", cglty = 1, cglwd = 0.8,
axislabcol = "white",
vlcex = vlcex, vlabels = vlabels,
caxislabels = caxislabels, title = title
)
}
# Step 2 — metric labels
metrics_name_plot <- c("G-PK/90", "xG/90", "G-xG", "Sh/90", "SoT/90",
"SCA/90", "GCA/90", "KP/90", "Fld/90",
"FinalThirdPasses/90", "ProgPasses/90")
# Step 3 — colors for each player
colors_radar <- c("#004D98", "#f7d62d", "#8DBF8D", "#FFA500")
# Step 4 — plot
op <- par(mar = c(1, 2, 2, 2))
create_radarchart(
data = df_final_plot,
color = colors_radar,
vlabels = metrics_name_plot
)
legend("bottomleft",
legend = rownames(df_final_plot[-c(1,2), ]),
horiz = FALSE,
bty = "n",
pch = 20,
col = colors_radar,
text.col = "black",
cex = 0.7,
pt.cex = 1.75)
title(
main = "Robert Lewandowski Replacement\nForwards, 24/25",
cex.main = 1,
col.main = "#2C3E50"
)Robert Lewandowski has been FC Barcelona’s attacking reference
point:
elite movement, high non-penalty goal production, consistent xG
generation, positional intelligence, and penalty-box efficiency.
Based on my radar comparison, similarity algorithm, and scoring analysis, three viable successor profiles emerge.
If Barcelona’s objective is to preserve immediate goal output and maintain a traditional central striker identity, Moise Kean is the strongest direct replacement.
Kean’s radar profile closely mirrors Lewandowski in:
He does not significantly exceed Lewandowski in creative metrics but replicates the core scoring function effectively.
Conclusion:
Kean represents the safest plug-and-play replacement if
Barcelona prioritizes continuity in the No.9 role.
If the club aims to maintain central presence while optimizing financial efficiency, Ermedin Demirović offers a compelling alternative.
Demirović provides:
Conclusion:
Demirović represents the value-efficient performance solution —
strong metrics with moderate financial exposure.
If Barcelona prioritizes long-term evolution rather than direct replication, Jonathan Burkardt offers mobility and growth potential.
Burkardt demonstrates:
Conclusion:
Burkardt represents a strategic evolution option rather than strict
stylistic continuity.
Barcelona has three clear strategic pathways:
1️⃣ Immediate Scoring Stability → Moise
Kean
Maintain traditional No.9 structure with minimal tactical
adjustment.
2️⃣ Cost-Performance Optimization → Ermedin
Demirović
Strong performance metrics at lower cost, reducing financial
risk.
3️⃣ Tactical Evolution & Long-Term Upside → Jonathan
Burkardt
Transition toward a more dynamic and fluid attacking structure aligned
with the younger core (e.g., Yamal, Pedri).
If the objective is short-term continuity and minimal
disruption,
→ Moise Kean is the closest statistical successor.
If financial flexibility is required,
→ Ermedin Demirović offers the strongest value-to-performance
ratio.
If the club envisions a tactical evolution beyond the Lewandowski
era,
→ Jonathan Burkardt provides the most adaptable long-term
solution.
By combining performance metrics, stylistic similarity, and market
feasibility,
FC Barcelona can adopt a data-driven, multi-dimensional
recruitment strategy to manage the post-Lewandowski
transition.