This report analyzes a dataset with English Premier League player data from the 2024/2025 season. The dataset includes performance data (goals, assists, passes, fouls, etc) and identification data (name, nationality, dob). Assuming I am a scout for a football (soccer) team and we need to get a top forward/striker, I will be selecting specific columns with metrics relevant for strikers.
url <- "https://raw.githubusercontent.com/JDO-MSDS/assignment_1_DATA607-player_stats/refs/heads/main/player_stats_2024_2025_season.csv"
player_stats <- read_csv(url)
## Rows: 1116 Columns: 47
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): player_name, Nationality, Preferred Foot, Date of Birth
## dbl (43): appearances_, sub_appearances, XA, pass_attempts, pass_accuracy, l...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
player_subset <- player_stats %>%
select(Name = player_name,
`Preferred Foot`,
Appearances = appearances_,
`Expected Goals` = XG,
Goals,
`Expected Assists` = XA,
Assists,
`Pass Attempts` = pass_attempts,
`Pass Accuracy` = pass_accuracy,
`Dribble Attempts` = dribble_attempts,
`Dribble Accuracy` = dribble_accuracy,
`Touches in the Opposition Box`,
`Duels Won`,
`Aerial Duels Won`,
`Shots On Target Inside the Box`,
`Shots On Target Outside the Box`,
`Hit Woodwork`,
Offsides) %>%
mutate(`Goal Contributions` = Goals + Assists) %>%
select(1:7, `Goal Contributions`, everything())
# structure
glimpse(player_subset)
## Rows: 1,116
## Columns: 19
## $ Name <chr> "Max Aarons", "George Abbott", "Zach…
## $ `Preferred Foot` <chr> "Right", "Right", "Right", "Right", …
## $ Appearances <dbl> 3, 0, 0, 4, 0, 28, 22, 0, 0, 29, 0, …
## $ `Expected Goals` <dbl> 0.00, 0.00, 0.00, 0.22, 0.00, 1.59, …
## $ Goals <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 2, 0, 1, …
## $ `Expected Assists` <dbl> 0.02, 0.00, 0.00, 0.02, 0.00, 0.76, …
## $ Assists <dbl> 0, 0, 0, 0, 0, 3, 1, 0, 0, 2, 0, 0, …
## $ `Goal Contributions` <dbl> 0, 0, 0, 0, 0, 3, 2, 0, 0, 4, 0, 1, …
## $ `Pass Attempts` <dbl> 51, 0, 0, 123, 0, 1056, 1177, 0, 0, …
## $ `Pass Accuracy` <dbl> 80, 0, 0, 84, 0, 85, 91, 0, 0, 82, 0…
## $ `Dribble Attempts` <dbl> 0, 0, 0, 0, 0, 14, 5, 0, 0, 48, 0, 1…
## $ `Dribble Accuracy` <dbl> 0, 0, 0, 0, 0, 29, 80, 0, 0, 44, 0, …
## $ `Touches in the Opposition Box` <dbl> 0, 0, 0, 3, 0, 18, 14, 0, 0, 79, 0, …
## $ `Duels Won` <dbl> 4, 0, 0, 5, 0, 139, 72, 0, 0, 55, 0,…
## $ `Aerial Duels Won` <dbl> 0, 0, 0, 1, 0, 31, 42, 0, 0, 7, 0, 2…
## $ `Shots On Target Inside the Box` <dbl> 0, 0, 0, 0, 0, 5, 5, 0, 0, 13, 0, 0,…
## $ `Shots On Target Outside the Box` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, …
## $ `Hit Woodwork` <dbl> 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0, 0, …
## $ Offsides <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 4, 0, 1, …
# table
player_subset %>%
arrange(desc(`Goal Contributions`)) %>%
head(10) %>%
kable()
Name | Preferred Foot | Appearances | Expected Goals | Goals | Expected Assists | Assists | Goal Contributions | Pass Attempts | Pass Accuracy | Dribble Attempts | Dribble Accuracy | Touches in the Opposition Box | Duels Won | Aerial Duels Won | Shots On Target Inside the Box | Shots On Target Outside the Box | Hit Woodwork | Offsides |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mohamed Salah | Left | 0 | 25.37 | 29 | 9.06 | 18 | 47 | 1155 | 74 | 130 | 45 | 394 | 127 | 9 | 52 | 6 | 6 | 18 |
Alexander Isak | Right | 0 | 20.42 | 23 | 3.60 | 6 | 29 | 658 | 77 | 89 | 47 | 211 | 94 | 27 | 38 | 8 | 3 | 20 |
Bryan Mbeumo | Left | 0 | 12.26 | 20 | 9.26 | 7 | 27 | 1092 | 74 | 99 | 53 | 177 | 179 | 35 | 31 | 11 | 1 | 11 |
Erling Haaland | Left | 0 | 22.01 | 22 | 2.02 | 3 | 25 | 375 | 67 | 33 | 39 | 204 | 94 | 57 | 49 | 5 | 4 | 4 |
Ollie Watkins | Right | 38 | 15.38 | 16 | 2.07 | 8 | 24 | 414 | 74 | 37 | 27 | 182 | 105 | 52 | 41 | 0 | 2 | 16 |
Cole Palmer | Left | 37 | 17.28 | 15 | 9.14 | 8 | 23 | 1310 | 83 | 97 | 53 | 156 | 147 | 2 | 37 | 39 | 6 | 7 |
Yoane Wissa | Right | 35 | 18.59 | 19 | 1.83 | 4 | 23 | 563 | 79 | 39 | 38 | 167 | 118 | 30 | 36 | 9 | 1 | 14 |
Chris Wood | Right | 36 | 13.35 | 20 | 1.52 | 3 | 23 | 484 | 65 | 12 | 33 | 111 | 128 | 94 | 21 | 6 | 1 | 28 |
Jarrod Bowen | Left | 0 | 8.65 | 13 | 6.47 | 8 | 21 | 700 | 79 | 120 | 37 | 183 | 148 | 12 | 37 | 15 | 1 | 10 |
Matheus Cunha | Right | 33 | 8.63 | 15 | 5.28 | 6 | 21 | 949 | 79 | 122 | 50 | 139 | 183 | 13 | 31 | 28 | 2 | 8 |
top10_forward <- player_subset %>%
arrange(desc(`Goal Contributions`)) %>%
slice_head(n = 10)
ggplot(top10_forward, aes(x = Goals, y = Assists, label = Name)) +
geom_point(color = "blue", size = 3) +
geom_text_repel() +
theme_light()
After the analysis, we can see that Mohamed Salah was clearly the top forward in the Premier League last season. It’s important to note that he has double the assists than the expect assists, which tells us he might not be in the 18 assists range on a long term. However, the gap between him and everyone else in goal contributions is substantial. In the graph, we can see that Haaland and Isak are goal scorers (not a lot of assists), while Cole Palmer is mainly an opportunity creator (top expected assists). Depending on the type of forward we (the team we work for) need for our team (scorer, creator, a dribbler), we can create another metric that emphasizes more on goals and touches in the area or instead on assists, pass, and dribble.