Premier League Player Analysis

Author

Mohamed Yussuf

library(tidyverse)
library(janitor)
library(ggrepel)

Introduction

This project uses Premier League player data from the 2023-24 season. The main question I want to explore is whether the number of goals a player scores helps explain the number of assists they provide.

Load the Data

prem <- read_csv("data/premier-player-23-24.csv") %>%
  clean_names()
Rows: 580 Columns: 34
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (4): Player, Nation, Pos, Team
dbl (30): Age, MP, Starts, Min, 90s, Gls, Ast, G+A, G-PK, PK, PKatt, CrdY, C...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(prem)
Rows: 580
Columns: 34
$ player        <chr> "Rodri", "Phil Foden", "Ederson", "Julián Álvarez", "Kyl…
$ nation        <chr> "es ESP", "eng ENG", "br BRA", "ar ARG", "eng ENG", "pt …
$ pos           <chr> "MF", "FW,MF", "GK", "MF,FW", "DF", "MF,FW", "FW", "DF",…
$ age           <dbl> 27, 23, 29, 23, 33, 28, 23, 26, 28, 21, 28, 21, 29, 32, …
$ mp            <dbl> 34, 35, 33, 36, 32, 33, 31, 30, 30, 28, 29, 29, 30, 18, …
$ starts        <dbl> 34, 33, 33, 31, 30, 29, 29, 28, 28, 26, 24, 18, 16, 15, …
$ min           <dbl> 2931, 2857, 2785, 2647, 2767, 2578, 2552, 2559, 2511, 23…
$ x90s          <dbl> 32.6, 31.7, 30.9, 29.4, 30.7, 28.6, 28.4, 28.4, 27.9, 25…
$ gls           <dbl> 8, 19, 0, 11, 0, 6, 27, 0, 2, 4, 2, 3, 1, 4, 1, 3, 2, 0,…
$ ast           <dbl> 9, 8, 0, 8, 4, 9, 5, 0, 0, 1, 2, 8, 0, 10, 0, 1, 0, 2, 0…
$ g_a           <dbl> 17, 27, 0, 19, 4, 15, 32, 0, 2, 5, 4, 11, 1, 14, 1, 4, 2…
$ g_pk          <dbl> 8, 19, 0, 9, 0, 6, 20, 0, 2, 4, 2, 3, 1, 4, 1, 3, 2, 0, …
$ pk            <dbl> 0, 0, 0, 2, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ p_katt        <dbl> 0, 0, 0, 2, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ crd_y         <dbl> 8, 2, 5, 2, 2, 8, 1, 0, 4, 3, 0, 3, 4, 2, 2, 6, 1, 0, 0,…
$ crd_r         <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ x_g           <dbl> 4.1, 10.3, 0.0, 13.0, 0.4, 3.7, 29.2, 1.4, 1.9, 3.1, 2.5…
$ npx_g         <dbl> 4.1, 10.3, 0.0, 11.5, 0.4, 3.7, 22.9, 1.4, 1.9, 3.1, 2.5…
$ x_ag          <dbl> 3.9, 8.4, 0.1, 6.4, 2.6, 7.6, 4.3, 0.3, 0.5, 1.4, 1.2, 4…
$ npx_g_x_ag    <dbl> 8.0, 18.7, 0.1, 17.9, 3.0, 11.3, 27.2, 1.7, 2.4, 4.5, 3.…
$ prg_c         <dbl> 76, 93, 0, 64, 74, 140, 35, 34, 46, 63, 24, 218, 37, 47,…
$ prg_p         <dbl> 376, 168, 4, 103, 157, 177, 26, 173, 148, 136, 116, 57, …
$ prg_r         <dbl> 55, 269, 0, 180, 172, 260, 126, 16, 36, 115, 18, 295, 38…
$ gls_90        <dbl> 0.25, 0.60, 0.00, 0.37, 0.00, 0.21, 0.95, 0.00, 0.07, 0.…
$ ast_90        <dbl> 0.28, 0.25, 0.00, 0.27, 0.13, 0.31, 0.18, 0.00, 0.00, 0.…
$ g_a_90        <dbl> 0.52, 0.85, 0.00, 0.65, 0.13, 0.52, 1.13, 0.00, 0.07, 0.…
$ g_pk_90       <dbl> 0.25, 0.60, 0.00, 0.31, 0.00, 0.21, 0.71, 0.00, 0.07, 0.…
$ g_a_pk_90     <dbl> 0.52, 0.85, 0.00, 0.58, 0.13, 0.52, 0.88, 0.00, 0.07, 0.…
$ x_g_90        <dbl> 0.12, 0.33, 0.00, 0.44, 0.01, 0.13, 1.03, 0.05, 0.07, 0.…
$ x_ag_90       <dbl> 0.12, 0.26, 0.00, 0.22, 0.09, 0.27, 0.15, 0.01, 0.02, 0.…
$ x_g_x_ag_90   <dbl> 0.24, 0.59, 0.00, 0.66, 0.10, 0.40, 1.18, 0.06, 0.09, 0.…
$ npx_g_90      <dbl> 0.12, 0.33, 0.00, 0.39, 0.01, 0.13, 0.81, 0.05, 0.07, 0.…
$ npx_g_x_ag_90 <dbl> 0.24, 0.59, 0.00, 0.61, 0.10, 0.40, 0.96, 0.06, 0.09, 0.…
$ team          <chr> "Manchester City", "Manchester City", "Manchester City",…

Clean the Data

prem_clean <- prem %>%
  filter(!is.na(gls), !is.na(ast), min > 0) %>%
  mutate(
    goal_contributions = gls + ast,
    contributions_per90 = goal_contributions / min * 90
  )

Goals vs Assists

ggplot(prem_clean, aes(x = gls, y = ast)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Relationship Between Goals and Assists",
    subtitle = "Premier League Players, 2023-24 Season",
    x = "Goals Scored",
    y = "Assists"
  )
`geom_smooth()` using formula = 'y ~ x'

This graph shows the relationship between goals and assists. Goals are the explanatory variable, and assists are the response variable. The trend line helps show whether players who score more goals also tend to provide more assists.

Top Players Highlighted

top_players <- prem_clean %>%
  arrange(desc(goal_contributions)) %>%
  slice_head(n = 10)

ggplot(prem_clean, aes(x = gls, y = ast)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm", se = FALSE) +
  geom_text_repel(
    data = top_players,
    aes(label = player),
    size = 3
  ) +
  labs(
    title = "Goals vs Assists with Top Players Highlighted",
    x = "Goals Scored",
    y = "Assists"
  )
`geom_smooth()` using formula = 'y ~ x'

This graph highlights the top attacking players based on total goal contributions. Players near the top-right are strong in both scoring and creating goals. Some players may score many goals but have fewer assists, while others may be more balanced.

Efficiency: Goal Contributions per 90

top_efficiency <- prem_clean %>%
  filter(min >= 900) %>%
  arrange(desc(contributions_per90)) %>%
  slice_head(n = 10)

ggplot(top_efficiency, aes(x = reorder(player, contributions_per90), y = contributions_per90)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Top 10 Players by Goal Contributions per 90",
    x = "Player",
    y = "Goals + Assists per 90 Minutes"
  )

This graph looks at efficiency instead of only total numbers. It shows which players produced the most goals and assists compared to their playing time. This is useful because players with fewer minutes can still be very effective.

Minutes vs Goal Contributions

ggplot(prem_clean, aes(x = min, y = goal_contributions)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Minutes Played vs Goal Contributions",
    x = "Minutes Played",
    y = "Goals + Assists"
  )
`geom_smooth()` using formula = 'y ~ x'

This graph shows whether players who play more minutes also tend to have more goal contributions. It helps separate players who produce because they play a lot from players who are more efficient with limited minutes.

Correlation

cor(prem_clean$gls, prem_clean$ast, use = "complete.obs")
[1] 0.6071441

The correlation value summarizes the relationship between goals and assists. If the value is positive, it means players with more goals generally also have more assists. If the value is close to zero, the relationship is weak.

Conclusion

Overall, this analysis shows how Premier League players contributed through goals and assists during the 2023-24 season. The graphs help show whether goal scorers also tend to create goals for teammates. The efficiency chart adds more context by showing which players were productive relative to their playing time.