Premier League Players Performance Analysis

Author

Pascal Hermann Kouogang Tafo

INTRODUCTION

This assignment aims to analyze the attacking performance of five elite Premier League players: Haaland, Salah, Palmer, Son, and Saka using the statistics covering goals, assists, and shots split by home and away fixtures at a given period of the season.We seek to identify meaningful patterns in finishing efficiency and goal contribution across different venues(Home/Away).

APPROACH

To conduct the analysis of the dataset, i will implement a structured data science pipeline using the “tidyverse” framework as followed:

  1. Create the data table in R and add to an existing GitHub repository ensuring its accessibility at anytime.

  2. Reshape the raw data from wide to tidy (long) format using pivot_longer(), separating each statistic by venue (Home/Away)

  3. Shot conversion rate

  1. Compare the players Shot conversion rate by calculating as total goals divided by total shots
  2. visualized as a ranked bar chart using ggplot2 and a group-average reference line.
  3. Interpretation
  1. Goal contribution comparison (Home vs Away)
  1. Compute by summing goals and assists per venue
  2. displayed as a grouped bar chart to expose venue-dependent performance patterns
  3. Interpretation

Load useful package

library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.5.2
Warning: package 'tibble' was built under R version 4.5.2
Warning: package 'tidyr' was built under R version 4.5.2
Warning: package 'readr' was built under R version 4.5.2
Warning: package 'purrr' was built under R version 4.5.2
Warning: package 'dplyr' was built under R version 4.5.2
Warning: package 'stringr' was built under R version 4.5.2
Warning: package 'forcats' was built under R version 4.5.2
Warning: package 'lubridate' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Create the CSV file of five elite Premier League players stats and Save it in my GitHub

PL_Players_Stats <- data.frame(

Players = c("HAALAND","SALAH","PALMER", "SON","SAKA"),

Clubs = c("MAN_City", "Liverpool","Chelsea", "Spurs", "Arsenal"),

Goals_Home = c(14, 11, 9, 8, 7),

Goals_Away = c(12, 10, 8, 6, 9),

Assists_Home = c(3, 8, 7, 5, 9),

Assists_Away = c(2, 6, 5, 7, 8),

Shots_Home = c(52, 44, 38, 29, 33),

Shots_Away = c(48, 39, 31, 27, 35)

)

PL_Players_Stats
  Players     Clubs Goals_Home Goals_Away Assists_Home Assists_Away Shots_Home
1 HAALAND  MAN_City         14         12            3            2         52
2   SALAH Liverpool         11         10            8            6         44
3  PALMER   Chelsea          9          8            7            5         38
4     SON     Spurs          8          6            5            7         29
5    SAKA   Arsenal          7          9            9            8         33
  Shots_Away
1         48
2         39
3         31
4         27
5         35
write.csv(PL_Players_Stats, "PL_Players_Stats.csv", row.names = FALSE)

Reshape the raw data from wide to tidy (long) format

using `pivot_longer()

# Read CSV file from Github repo

url<- "https://raw.githubusercontent.com/Pascaltafo2025/PROJECT-2--TIDY-DATA-ANALYSIS/refs/heads/main/PL_Players_Stats.csv"

PL_Players_Stats <- read.csv(url)
PL_Players_Stats
  Players     Clubs Goals_Home Goals_Away Assists_Home Assists_Away Shots_Home
1 HAALAND  MAN_City         14         12            3            2         52
2   SALAH Liverpool         11         10            8            6         44
3  PALMER   Chelsea          9          8            7            5         38
4     SON     Spurs          8          6            5            7         29
5    SAKA   Arsenal          7          9            9            8         33
  Shots_Away
1         48
2         39
3         31
4         27
5         35
# Reshape wide → long using pivot_longer (equivalent to melt in Python)

PL_Players_Stats_long <- PL_Players_Stats %>%
     pivot_longer(
      cols      = Goals_Home:Shots_Away,
    names_to  = c("Stat", "Venue"),
    names_sep = "_",
    values_to = "Value"
      )
PL_Players_Stats_long
# A tibble: 30 × 5
   Players Clubs     Stat    Venue Value
   <chr>   <chr>     <chr>   <chr> <int>
 1 HAALAND MAN_City  Goals   Home     14
 2 HAALAND MAN_City  Goals   Away     12
 3 HAALAND MAN_City  Assists Home      3
 4 HAALAND MAN_City  Assists Away      2
 5 HAALAND MAN_City  Shots   Home     52
 6 HAALAND MAN_City  Shots   Away     48
 7 SALAH   Liverpool Goals   Home     11
 8 SALAH   Liverpool Goals   Away     10
 9 SALAH   Liverpool Assists Home      8
10 SALAH   Liverpool Assists Away      6
# ℹ 20 more rows

Let’s Compute shot conversion rate and Visualization

The shot conversion rate measures a player’s clinical finishing by calculating the percentage of shots that result in goals.

PL_Players_scr <- PL_Players_Stats %>%
  mutate(
    Total_Goals    = Goals_Home + Goals_Away,
    Total_Shots    = Shots_Home + Shots_Away,
    Conv_Rate_Pct  = round(Total_Goals / Total_Shots * 100, 1)
  ) %>%
  arrange(desc(Conv_Rate_Pct))

average_rate <- mean(PL_Players_scr$Conv_Rate_Pct)


# Plot

ggplot(PL_Players_scr, aes(x = reorder(Players, -Conv_Rate_Pct), y = Conv_Rate_Pct, fill = Players)) +
  geom_col(width = 0.55, colour = "white", linewidth = 0.4) +
  geom_hline(aes(yintercept = average_rate, linetype = "Group Average"),
             colour = "gold", linewidth = 1.2) +
  geom_text(aes(label = paste0(Conv_Rate_Pct, "%")),
            vjust = 0.5, fontface = "bold", size = 4.2) +
  scale_linetype_manual(name = "", values = "dashed") +
  scale_fill_manual(values = c(
    HAALAND = "#6CABDD", SALAH = "#C8102E",
    PALMER  = "#034694", SON   = "#132257", SAKA = "#EF0107"
  )) +
  labs(
    title    = "Shot Conversion Rate — Premier League Top Attackers",
    subtitle = paste0("Dashed line = group average (", round(average_rate, 1), "%)"),
    x        = "Player",
    y        = "Conversion Rate (%)",
    fill     = "Player"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold"),
    panel.grid.major.x = element_blank(),
    legend.position = "none"
  )

Interpretation

Haaland leads the group with a 26.0% conversion rate, demonstrating why he is one of the most clinical strikers in the league. Salah (25.3%) and Son (25.0%) follow closely, showing consistent efficiency. Interestingly, Saka has the lowest conversion rate among the top 5 at 23.5%, though this still represents elite performance compared to the league average. All five players exhibit remarkably high efficiency, converting approximately 1 out of every 4 shots taken.

Compute goal contributions and plot the graph for a better visualization

Goal contribution is the sum of goals and assists, providing a holistic view of a player’s offensive impact which is why it is an important variabele to look at when analyzing attacking players performance.

# Goal contribution metric

PL_Players_gc <- PL_Players_Stats %>%
  mutate(
    GC_Home  = Goals_Home  + Assists_Home,
    GC_Away  = Goals_Away  + Assists_Away,
    GC_Total = GC_Home + GC_Away
  ) %>%
  arrange(desc(GC_Total)) |>
  pivot_longer(cols = c(GC_Home, GC_Away), names_to = "Venue", values_to = "GC") |>
  mutate(Venue = recode(Venue, GC_Home = "Home", GC_Away = "Away"))

# Visualization

ggplot(PL_Players_gc, aes(x = reorder(Players, -GC_Total), y = GC, fill = Venue)) +
  geom_col(position = position_dodge(width = 0.6), width = 0.55,
           colour = "white", linewidth = 0.4) +
  geom_text(aes(label = GC), position = position_dodge(width = 0.6),
            vjust = -0.5, fontface = "bold", size = 4) +
  scale_fill_manual(values = c(Home = "#38BDF8", Away = "#FB923C")) +
  labs(
    title    = "Goal Contributions (Goals + Assists): Home vs Away",
    subtitle = "Premier League Top Attackers — Current Season",
    x        = "Player",
    y        = "Goal Contributions",
    fill     = "Venue"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title         = element_text(face = "bold"),
    panel.grid.major.x = element_blank()
  )

Interprepretation

The bar graph reveals different degrees of “home-ground advantage”. Haaland and Salah are significantly more productive at home, with Salah recording 19 contributions at Anfield(liverpool stadium) compared to 16 away. In contrary, Saka is showing slightly higher productivity away from home (17 contributions) than at the Emirates (16). Finally, Son Heung-min shows perfect consistency, with an identical contribution count of 13 both at home and away, suggesting he is equally effective regardless of the venue.

CONCLUSION

All five players analyzed demonstrate world-class efficiency, with each maintaining a shot conversion rate above 23%. Haaland stands out as the most clinical finisher in the dataset with a 26% conversion rate, proving that he requires the fewest opportunities to find the back of the net. However, the narrow margin between the highest (Haaland, 26%) and the lowest (Saka, 23.5%) suggests that these players occupy a similar tier of elite performance.That explains why they are the star of their respective team.