Overview

This report analyzes a dataset with English Premier League player data from the 2024/2025 season. The dataset includes performance data (goals, assists, passes, fouls, etc) and identification data (name, nationality, dob). Assuming I am a scout for a football (soccer) team and we need to get a top forward/striker, I will be selecting specific columns with metrics relevant for strikers.

Load the data from github

url <- "https://raw.githubusercontent.com/JDO-MSDS/assignment_1_DATA607-player_stats/refs/heads/main/player_stats_2024_2025_season.csv"

player_stats <- read_csv(url)
## Rows: 1116 Columns: 47
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): player_name, Nationality, Preferred Foot, Date of Birth
## dbl (43): appearances_, sub_appearances, XA, pass_attempts, pass_accuracy, l...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Select the most relevant metrics for forwards, create a new metric named goal contributions that includes goals+assists, and rename some columns. Then show the subset structure and top rows.

player_subset <- player_stats %>%
  select(Name = player_name,
         `Preferred Foot`,
         Appearances = appearances_,
         `Expected Goals` = XG,
         Goals,
         `Expected Assists` = XA,
         Assists,
         `Pass Attempts` = pass_attempts,
         `Pass Accuracy` = pass_accuracy,
         `Dribble Attempts` = dribble_attempts,
         `Dribble Accuracy` = dribble_accuracy,
         `Touches in the Opposition Box`,
         `Duels Won`,
         `Aerial Duels Won`,
         `Shots On Target Inside the Box`,
         `Shots On Target Outside the Box`,
         `Hit Woodwork`, 
         Offsides) %>%
  mutate(`Goal Contributions` = Goals + Assists) %>%
  select(1:7, `Goal Contributions`, everything())

# structure
glimpse(player_subset)
## Rows: 1,116
## Columns: 19
## $ Name                              <chr> "Max Aarons", "George Abbott", "Zach…
## $ `Preferred Foot`                  <chr> "Right", "Right", "Right", "Right", …
## $ Appearances                       <dbl> 3, 0, 0, 4, 0, 28, 22, 0, 0, 29, 0, …
## $ `Expected Goals`                  <dbl> 0.00, 0.00, 0.00, 0.22, 0.00, 1.59, …
## $ Goals                             <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 2, 0, 1, …
## $ `Expected Assists`                <dbl> 0.02, 0.00, 0.00, 0.02, 0.00, 0.76, …
## $ Assists                           <dbl> 0, 0, 0, 0, 0, 3, 1, 0, 0, 2, 0, 0, …
## $ `Goal Contributions`              <dbl> 0, 0, 0, 0, 0, 3, 2, 0, 0, 4, 0, 1, …
## $ `Pass Attempts`                   <dbl> 51, 0, 0, 123, 0, 1056, 1177, 0, 0, …
## $ `Pass Accuracy`                   <dbl> 80, 0, 0, 84, 0, 85, 91, 0, 0, 82, 0…
## $ `Dribble Attempts`                <dbl> 0, 0, 0, 0, 0, 14, 5, 0, 0, 48, 0, 1…
## $ `Dribble Accuracy`                <dbl> 0, 0, 0, 0, 0, 29, 80, 0, 0, 44, 0, …
## $ `Touches in the Opposition Box`   <dbl> 0, 0, 0, 3, 0, 18, 14, 0, 0, 79, 0, …
## $ `Duels Won`                       <dbl> 4, 0, 0, 5, 0, 139, 72, 0, 0, 55, 0,…
## $ `Aerial Duels Won`                <dbl> 0, 0, 0, 1, 0, 31, 42, 0, 0, 7, 0, 2…
## $ `Shots On Target Inside the Box`  <dbl> 0, 0, 0, 0, 0, 5, 5, 0, 0, 13, 0, 0,…
## $ `Shots On Target Outside the Box` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, …
## $ `Hit Woodwork`                    <dbl> 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0, 0, …
## $ Offsides                          <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 4, 0, 1, …
# table
player_subset %>%
  arrange(desc(`Goal Contributions`)) %>%
  head(10) %>%
  kable()
Name Preferred Foot Appearances Expected Goals Goals Expected Assists Assists Goal Contributions Pass Attempts Pass Accuracy Dribble Attempts Dribble Accuracy Touches in the Opposition Box Duels Won Aerial Duels Won Shots On Target Inside the Box Shots On Target Outside the Box Hit Woodwork Offsides
Mohamed Salah Left 0 25.37 29 9.06 18 47 1155 74 130 45 394 127 9 52 6 6 18
Alexander Isak Right 0 20.42 23 3.60 6 29 658 77 89 47 211 94 27 38 8 3 20
Bryan Mbeumo Left 0 12.26 20 9.26 7 27 1092 74 99 53 177 179 35 31 11 1 11
Erling Haaland Left 0 22.01 22 2.02 3 25 375 67 33 39 204 94 57 49 5 4 4
Ollie Watkins Right 38 15.38 16 2.07 8 24 414 74 37 27 182 105 52 41 0 2 16
Cole Palmer Left 37 17.28 15 9.14 8 23 1310 83 97 53 156 147 2 37 39 6 7
Yoane Wissa Right 35 18.59 19 1.83 4 23 563 79 39 38 167 118 30 36 9 1 14
Chris Wood Right 36 13.35 20 1.52 3 23 484 65 12 33 111 128 94 21 6 1 28
Jarrod Bowen Left 0 8.65 13 6.47 8 21 700 79 120 37 183 148 12 37 15 1 10
Matheus Cunha Right 33 8.63 15 5.28 6 21 949 79 122 50 139 183 13 31 28 2 8

Plot of Top 10 Forwards

top10_forward <- player_subset %>%
  arrange(desc(`Goal Contributions`)) %>%
  slice_head(n = 10)

ggplot(top10_forward, aes(x = Goals, y = Assists, label = Name)) +
  geom_point(color = "blue", size = 3) +
  geom_text_repel() + 
  theme_light()

Conclusions

After the analysis, we can see that Mohamed Salah was clearly the top forward in the Premier League last season. It’s important to note that he has double the assists than the expect assists, which tells us he might not be in the 18 assists range on a long term. However, the gap between him and everyone else in goal contributions is substantial. In the graph, we can see that Haaland and Isak are goal scorers (not a lot of assists), while Cole Palmer is mainly an opportunity creator (top expected assists). Depending on the type of forward we (the team we work for) need for our team (scorer, creator, a dribbler), we can create another metric that emphasizes more on goals and touches in the area or instead on assists, pass, and dribble.