For this project, I used the dataset of the 2024/2025 Premier League season available as “epl player 24/25 stat.csv.” The purpose of this project was to find and compare the top goal scorer from each team in the league. To do this, I first arranged the data according to each club in the league. Within each club’s data, I counted the number of goals each player scored. From each team, I selected the player that scored the most goals.
These players with their goal totals were represented in a bar graph. Each bar represented each team in the Premier League, with the height of the bar indicating the number of goals that their top scorer for the season collected. This bar graph helps to visually compare the goal scoring abilities of each team’s players. Overall, then, this project helps to demonstrate the use of data and data visualization to uncover insights about a given dataset.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.0 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggfortify)library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
Rows: 562 Columns: 57
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (11): Player Name, Club, Nationality, Position, Conversion %, Passes%, C...
dbl (46): Appearances, Minutes, Goals, Assists, Shots, Shots On Target, Big ...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(EPL_Player)
# A tibble: 6 × 57
`Player Name` Club Nationality Position Appearances Minutes Goals Assists
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Ben White Arsenal England DEF 17 1198 0 2
2 Bukayo Saka Arsenal England MID 25 1735 6 10
3 David Raya Arsenal Spain GKP 38 3420 0 0
4 Declan Rice Arsenal England MID 35 2833 4 7
5 Ethan Nwaneri Arsenal England MID 26 889 4 0
6 Gabriel Jesus Arsenal Brazil FWD 17 603 3 0
# ℹ 49 more variables: Shots <dbl>, `Shots On Target` <dbl>,
# `Conversion %` <chr>, `Big Chances Missed` <dbl>, `Hit Woodwork` <dbl>,
# Offsides <dbl>, Touches <dbl>, Passes <dbl>, `Successful Passes` <dbl>,
# `Passes%` <chr>, Crosses <dbl>, `Successful Crosses` <dbl>,
# `Crosses %` <chr>, `fThird Passes` <dbl>, `Successful fThird Passes` <dbl>,
# `fThird Passes %` <chr>, `Through Balls` <dbl>, Carries <dbl>,
# `Progressive Carries` <dbl>, `Carries Ended with Goal` <dbl>, …
EPL_clean <- EPL_Player |>filter(!is.na(Goals),!is.na(Shots),!is.na(`Shots On Target`),!is.na(Assists),!is.na(Appearances),!is.na(Minutes) ) |>filter(Position !="GKP")
Call:
lm(formula = Goals ~ Shots + `Shots On Target` + Assists + Appearances +
Minutes, data = EPL_clean)
Residuals:
Min 1Q Median 3Q Max
-14.9630 -0.4991 -0.0827 0.4889 13.3278
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1247336 0.2080039 0.600 0.5490
Shots 0.1262031 0.0063015 20.027 <2e-16 ***
`Shots On Target` -0.0089056 0.0113460 -0.785 0.4329
Assists 0.0210609 0.0543278 0.388 0.6984
Appearances -0.0385597 0.0204262 -1.888 0.0596 .
Minutes 0.0001253 0.0002342 0.535 0.5928
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.154 on 511 degrees of freedom
Multiple R-squared: 0.6056, Adjusted R-squared: 0.6017
F-statistic: 156.9 on 5 and 511 DF, p-value: < 2.2e-16
autoplot(fit1, 1:4, nrow =2, ncol =2)
Warning: `fortify(<lm>)` was deprecated in ggplot2 4.0.0.
ℹ Please use `broom::augment(<lm>)` instead.
ℹ The deprecated feature was likely used in the ggfortify package.
Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
ℹ The deprecated feature was likely used in the ggfortify package.
Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ℹ The deprecated feature was likely used in the ggfortify package.
Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
# A tibble: 20 × 5
Club `Player Name` Nationality Position Goals
<chr> <chr> <chr> <chr> <dbl>
1 Ipswich Town Sam Szmodics Ireland FWD 4
2 Southampton Paul Onuachu Nigeria FWD 4
3 Bournemouth Dango Ouattara Burkina Faso MID 7
4 Manchester United Amad Diallo Cote D’Ivoire MID 8
5 Arsenal Kai Havertz Germany FWD 9
6 Everton Iliman Ndiaye Senegal FWD 9
7 Fulham Alex Iwobi Nigeria MID 9
8 Leicester City Jamie Vardy England FWD 9
9 Tottenham Hotspur Dominic Solanke England FWD 9
10 West Ham United Tomás Soucek Czech Republic MID 9
11 Brighton & Hove Albion Danny Welbeck England FWD 10
12 Crystal Palace Jean-Philippe Mateta France FWD 14
13 Chelsea Cole Palmer England MID 15
14 Wolverhampton Wanderers Matheus Cunha Brazil MID 15
15 Aston Villa Ollie Watkins England FWD 16
16 Brentford Bryan Mbeumo Cameroon MID 20
17 Nottingham Forest Chris Wood New Zealand FWD 20
18 Manchester City Erling Haaland Norway FWD 22
19 Newcastle United Alexander Isak Sweden FWD 23
20 Liverpool Mohamed Salah Egypt MID 29
ggplot Graph
ggplot(top_scorers, aes(x =`Player Name`, y = Goals, fill =`Player Name`)) +geom_bar(stat ="identity",) +coord_flip() +labs(title ="Top Goal Scorer from Each Club",x ="Top Scoring Players",y ="Goals" ) +theme_minimal(base_size =15)+theme(legend.position ="none")
bar_plot <-ggplot(top_scorers, aes(x =`Player Name`,y = Goals,fill = Position,text =paste("Player:", `Player Name`,"<br>Club:", Club,"<br>Position:", Position,"<br>Goals:", Goals) )) +geom_bar(stat ="identity") +coord_flip() +scale_fill_manual(values =c("FWD"="darkred", "MID"="navyblue", "DEF"="grey") ) +labs(title ="Top Goal Scorer from Each EPL Club (2024/25 Season)",x ="Player",y ="Goals Scored",fill ="Position",caption ="Source: Premier League Official Statistics (premierleague.com)" ) +theme_minimal(base_size =15)