library(tidyverse)
library(janitor)
library(ggrepel)Premier League Player Analysis
Introduction
This project uses Premier League player data from the 2023-24 season. I am interested in looking at attacking performance, especially goals, assists, and attacking efficiency. The main questions I want to explore are whether goal scorers also provide assists, whether forwards or midfielders assist more, and which players are most efficient with their playing time.
Load the Data
prem <- read_csv("data/premier-player-23-24.csv") %>%
clean_names()Rows: 580 Columns: 34
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): Player, Nation, Pos, Team
dbl (30): Age, MP, Starts, Min, 90s, Gls, Ast, G+A, G-PK, PK, PKatt, CrdY, C...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(prem)Rows: 580
Columns: 34
$ player <chr> "Rodri", "Phil Foden", "Ederson", "Julián Álvarez", "Kyl…
$ nation <chr> "es ESP", "eng ENG", "br BRA", "ar ARG", "eng ENG", "pt …
$ pos <chr> "MF", "FW,MF", "GK", "MF,FW", "DF", "MF,FW", "FW", "DF",…
$ age <dbl> 27, 23, 29, 23, 33, 28, 23, 26, 28, 21, 28, 21, 29, 32, …
$ mp <dbl> 34, 35, 33, 36, 32, 33, 31, 30, 30, 28, 29, 29, 30, 18, …
$ starts <dbl> 34, 33, 33, 31, 30, 29, 29, 28, 28, 26, 24, 18, 16, 15, …
$ min <dbl> 2931, 2857, 2785, 2647, 2767, 2578, 2552, 2559, 2511, 23…
$ x90s <dbl> 32.6, 31.7, 30.9, 29.4, 30.7, 28.6, 28.4, 28.4, 27.9, 25…
$ gls <dbl> 8, 19, 0, 11, 0, 6, 27, 0, 2, 4, 2, 3, 1, 4, 1, 3, 2, 0,…
$ ast <dbl> 9, 8, 0, 8, 4, 9, 5, 0, 0, 1, 2, 8, 0, 10, 0, 1, 0, 2, 0…
$ g_a <dbl> 17, 27, 0, 19, 4, 15, 32, 0, 2, 5, 4, 11, 1, 14, 1, 4, 2…
$ g_pk <dbl> 8, 19, 0, 9, 0, 6, 20, 0, 2, 4, 2, 3, 1, 4, 1, 3, 2, 0, …
$ pk <dbl> 0, 0, 0, 2, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ p_katt <dbl> 0, 0, 0, 2, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ crd_y <dbl> 8, 2, 5, 2, 2, 8, 1, 0, 4, 3, 0, 3, 4, 2, 2, 6, 1, 0, 0,…
$ crd_r <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ x_g <dbl> 4.1, 10.3, 0.0, 13.0, 0.4, 3.7, 29.2, 1.4, 1.9, 3.1, 2.5…
$ npx_g <dbl> 4.1, 10.3, 0.0, 11.5, 0.4, 3.7, 22.9, 1.4, 1.9, 3.1, 2.5…
$ x_ag <dbl> 3.9, 8.4, 0.1, 6.4, 2.6, 7.6, 4.3, 0.3, 0.5, 1.4, 1.2, 4…
$ npx_g_x_ag <dbl> 8.0, 18.7, 0.1, 17.9, 3.0, 11.3, 27.2, 1.7, 2.4, 4.5, 3.…
$ prg_c <dbl> 76, 93, 0, 64, 74, 140, 35, 34, 46, 63, 24, 218, 37, 47,…
$ prg_p <dbl> 376, 168, 4, 103, 157, 177, 26, 173, 148, 136, 116, 57, …
$ prg_r <dbl> 55, 269, 0, 180, 172, 260, 126, 16, 36, 115, 18, 295, 38…
$ gls_90 <dbl> 0.25, 0.60, 0.00, 0.37, 0.00, 0.21, 0.95, 0.00, 0.07, 0.…
$ ast_90 <dbl> 0.28, 0.25, 0.00, 0.27, 0.13, 0.31, 0.18, 0.00, 0.00, 0.…
$ g_a_90 <dbl> 0.52, 0.85, 0.00, 0.65, 0.13, 0.52, 1.13, 0.00, 0.07, 0.…
$ g_pk_90 <dbl> 0.25, 0.60, 0.00, 0.31, 0.00, 0.21, 0.71, 0.00, 0.07, 0.…
$ g_a_pk_90 <dbl> 0.52, 0.85, 0.00, 0.58, 0.13, 0.52, 0.88, 0.00, 0.07, 0.…
$ x_g_90 <dbl> 0.12, 0.33, 0.00, 0.44, 0.01, 0.13, 1.03, 0.05, 0.07, 0.…
$ x_ag_90 <dbl> 0.12, 0.26, 0.00, 0.22, 0.09, 0.27, 0.15, 0.01, 0.02, 0.…
$ x_g_x_ag_90 <dbl> 0.24, 0.59, 0.00, 0.66, 0.10, 0.40, 1.18, 0.06, 0.09, 0.…
$ npx_g_90 <dbl> 0.12, 0.33, 0.00, 0.39, 0.01, 0.13, 0.81, 0.05, 0.07, 0.…
$ npx_g_x_ag_90 <dbl> 0.24, 0.59, 0.00, 0.61, 0.10, 0.40, 0.96, 0.06, 0.09, 0.…
$ team <chr> "Manchester City", "Manchester City", "Manchester City",…
Clean the Data
prem_clean <- prem %>%
filter(!is.na(gls), !is.na(ast), min > 0) %>%
mutate(
goal_contributions = gls + ast,
contributions_per90 = goal_contributions / min * 90,
assists_per90 = ast / min * 90
)Goals vs Assists
ggplot(prem_clean, aes(x = gls, y = ast)) +
geom_point(alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE) +
labs(
title = "Relationship Between Goals and Assists",
subtitle = "Premier League Players, 2023-24 Season",
x = "Goals Scored",
y = "Assists"
)`geom_smooth()` using formula = 'y ~ x'
This graph shows the relationship between goals and assists. Goals are the explanatory variable, and assists are the response variable. The trend line helps show whether players who score more goals also tend to provide more assists.
Top Players Highlighted
top_players <- prem_clean %>%
arrange(desc(goal_contributions)) %>%
slice_head(n = 10)
ggplot(prem_clean, aes(x = gls, y = ast)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE) +
geom_text_repel(
data = top_players,
aes(label = player),
size = 3
) +
labs(
title = "Goals vs Assists with Top Players Highlighted",
x = "Goals Scored",
y = "Assists"
)`geom_smooth()` using formula = 'y ~ x'
This graph highlights the top attacking players based on total goal contributions. Players near the top-right are strong in both scoring and creating goals. Some players are more goal-focused, while others are more balanced between goals and assists.
Midfielders vs Forwards: Assists
mf_fw <- prem_clean %>%
filter(pos %in% c("MF", "FW"))
ggplot(mf_fw, aes(x = pos, y = ast)) +
geom_boxplot() +
labs(
title = "Assists Comparison: Midfielders vs Forwards",
subtitle = "MF = Midfielder, FW = Forward",
x = "Position",
y = "Assists"
)This graph compares assists between midfielders and forwards. Midfielders are usually expected to create chances, while forwards are closer to goal and involved in many attacking actions. This helps show whether forwards or midfielders record more assists overall.
Midfielders vs Forwards: Assists per 90
ggplot(mf_fw, aes(x = pos, y = assists_per90)) +
geom_boxplot() +
labs(
title = "Assists per 90: Midfielders vs Forwards",
subtitle = "Comparing creative output while accounting for playing time",
x = "Position",
y = "Assists per 90 Minutes"
)This graph looks at assists per 90 minutes instead of total assists. This is useful because some players play more minutes than others. Assists per 90 gives a better view of which position creates more often relative to playing time.
Efficiency: Goal Contributions per 90
top_efficiency <- prem_clean %>%
filter(min >= 900) %>%
arrange(desc(contributions_per90)) %>%
slice_head(n = 10)
ggplot(top_efficiency, aes(x = reorder(player, contributions_per90), y = contributions_per90)) +
geom_col() +
coord_flip() +
labs(
title = "Top 10 Players by Goal Contributions per 90",
x = "Player",
y = "Goals + Assists per 90 Minutes"
)This graph looks at efficiency instead of only total numbers. It shows which players produced the most goals and assists compared to their playing time. This is useful because players with fewer minutes can still be very effective.
Correlation
cor(prem_clean$gls, prem_clean$ast, use = "complete.obs")[1] 0.6071441
The correlation value summarizes the relationship between goals and assists. If the value is positive, it means players with more goals generally also have more assists. If the value is close to zero, the relationship is weak.
Conclusion
Overall, this analysis shows how Premier League players contributed through goals and assists during the 2023-24 season. The goals vs assists graphs show whether scorers also tend to create goals for teammates. The midfielders vs forwards comparison helps test whether midfielders assist more because of their creative role, or whether forwards also produce assists because they are often involved in attacking actions. Finally, the efficiency chart adds more context by showing which players were most productive relative to their playing time.