This assignment aims to analyze the attacking performance of five elite Premier League players: Haaland, Salah, Palmer, Son, and Saka using the statistics covering goals, assists, and shots split by home and away fixtures at a given period of the season.We seek to identify meaningful patterns in finishing efficiency and goal contribution across different venues(Home/Away).
APPROACH
To conduct the analysis of the dataset, i will implement a structured data science pipeline using the “tidyverse” framework as followed:
Create the data table in R and add to an existing GitHub repository ensuring its accessibility at anytime.
Reshape the raw data from wide to tidy (long) format using pivot_longer(), separating each statistic by venue (Home/Away)
Shot conversion rate
Compare the players Shot conversion rate by calculating as total goals divided by total shots
visualized as a ranked bar chart using ggplot2 and a group-average reference line.
Interpretation
Goal contribution comparison (Home vs Away)
Compute by summing goals and assists per venue
displayed as a grouped bar chart to expose venue-dependent performance patterns
Interpretation
Load useful package
library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.5.2
Warning: package 'tibble' was built under R version 4.5.2
Warning: package 'tidyr' was built under R version 4.5.2
Warning: package 'readr' was built under R version 4.5.2
Warning: package 'purrr' was built under R version 4.5.2
Warning: package 'dplyr' was built under R version 4.5.2
Warning: package 'stringr' was built under R version 4.5.2
Warning: package 'forcats' was built under R version 4.5.2
Warning: package 'lubridate' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.0 ✔ readr 2.2.0
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Create the CSV file of five elite Premier League players stats and Save it in my GitHub
Reshape the raw data from wide to tidy (long) format
using `pivot_longer()
# Read CSV file from Github repourl<-"https://raw.githubusercontent.com/Pascaltafo2025/PROJECT-2--TIDY-DATA-ANALYSIS/refs/heads/main/PL_Players_Stats.csv"PL_Players_Stats <-read.csv(url)PL_Players_Stats
Players Clubs Goals_Home Goals_Away Assists_Home Assists_Away Shots_Home
1 HAALAND MAN_City 14 12 3 2 52
2 SALAH Liverpool 11 10 8 6 44
3 PALMER Chelsea 9 8 7 5 38
4 SON Spurs 8 6 5 7 29
5 SAKA Arsenal 7 9 9 8 33
Shots_Away
1 48
2 39
3 31
4 27
5 35
# Reshape wide → long using pivot_longer (equivalent to melt in Python)PL_Players_Stats_long <- PL_Players_Stats %>%pivot_longer(cols = Goals_Home:Shots_Away,names_to =c("Stat", "Venue"),names_sep ="_",values_to ="Value" )PL_Players_Stats_long
# A tibble: 30 × 5
Players Clubs Stat Venue Value
<chr> <chr> <chr> <chr> <int>
1 HAALAND MAN_City Goals Home 14
2 HAALAND MAN_City Goals Away 12
3 HAALAND MAN_City Assists Home 3
4 HAALAND MAN_City Assists Away 2
5 HAALAND MAN_City Shots Home 52
6 HAALAND MAN_City Shots Away 48
7 SALAH Liverpool Goals Home 11
8 SALAH Liverpool Goals Away 10
9 SALAH Liverpool Assists Home 8
10 SALAH Liverpool Assists Away 6
# ℹ 20 more rows
Let’s Compute shot conversion rate and Visualization
The shot conversion rate measures a player’s clinical finishing by calculating the percentage of shots that result in goals.
PL_Players_scr <- PL_Players_Stats %>%mutate(Total_Goals = Goals_Home + Goals_Away,Total_Shots = Shots_Home + Shots_Away,Conv_Rate_Pct =round(Total_Goals / Total_Shots *100, 1) ) %>%arrange(desc(Conv_Rate_Pct))average_rate <-mean(PL_Players_scr$Conv_Rate_Pct)# Plotggplot(PL_Players_scr, aes(x =reorder(Players, -Conv_Rate_Pct), y = Conv_Rate_Pct, fill = Players)) +geom_col(width =0.55, colour ="white", linewidth =0.4) +geom_hline(aes(yintercept = average_rate, linetype ="Group Average"),colour ="gold", linewidth =1.2) +geom_text(aes(label =paste0(Conv_Rate_Pct, "%")),vjust =0.5, fontface ="bold", size =4.2) +scale_linetype_manual(name ="", values ="dashed") +scale_fill_manual(values =c(HAALAND ="#6CABDD", SALAH ="#C8102E",PALMER ="#034694", SON ="#132257", SAKA ="#EF0107" )) +labs(title ="Shot Conversion Rate — Premier League Top Attackers",subtitle =paste0("Dashed line = group average (", round(average_rate, 1), "%)"),x ="Player",y ="Conversion Rate (%)",fill ="Player" ) +theme_minimal(base_size =13) +theme(plot.title =element_text(face ="bold"),panel.grid.major.x =element_blank(),legend.position ="none" )
Interpretation
Haaland leads the group with a 26.0% conversion rate, demonstrating why he is one of the most clinical strikers in the league. Salah (25.3%) and Son (25.0%) follow closely, showing consistent efficiency. Interestingly, Saka has the lowest conversion rate among the top 5 at 23.5%, though this still represents elite performance compared to the league average. All five players exhibit remarkably high efficiency, converting approximately 1 out of every 4 shots taken.
Compute goal contributions and plot the graph for a better visualization
Goal contribution is the sum of goals and assists, providing a holistic view of a player’s offensive impact which is why it is an important variabele to look at when analyzing attacking players performance.
The bar graph reveals different degrees of “home-ground advantage”. Haaland and Salah are significantly more productive at home, with Salah recording 19 contributions at Anfield(liverpool stadium) compared to 16 away. In contrary, Saka is showing slightly higher productivity away from home (17 contributions) than at the Emirates (16). Finally, Son Heung-min shows perfect consistency, with an identical contribution count of 13 both at home and away, suggesting he is equally effective regardless of the venue.
CONCLUSION
All five players analyzed demonstrate world-class efficiency, with each maintaining a shot conversion rate above 23%. Haaland stands out as the most clinical finisher in the dataset with a 26% conversion rate, proving that he requires the fewest opportunities to find the back of the net. However, the narrow margin between the highest (Haaland, 26%) and the lowest (Saka, 23.5%) suggests that these players occupy a similar tier of elite performance.That explains why they are the star of their respective team.