EPL Player Stat

Author

Bedassa

EPL Player Stat for the Year 2024/25

Introduction

For this project, I used the dataset of the 2024/2025 Premier League season available as “epl player 24/25 stat.csv.” The purpose of this project was to find and compare the top goal scorer from each team in the league. To do this, I first arranged the data according to each club in the league. Within each club’s data, I counted the number of goals each player scored. From each team, I selected the player that scored the most goals.

These players with their goal totals were represented in a bar graph. Each bar represented each team in the Premier League, with the height of the bar indicating the number of goals that their top scorer for the season collected. This bar graph helps to visually compare the goal scoring abilities of each team’s players. Overall, then, this project helps to demonstrate the use of data and data visualization to uncover insights about a given dataset.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggfortify)
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
getwd()
[1] "/Users/kidusteffera/Desktop/DATA110/week 7/Project 1 "
setwd("/Users/kidusteffera/Desktop/DATA110/week 7/Project 1 ")
EPL_Player <- read_csv("epl_player_stats_24_25.csv")
Rows: 562 Columns: 57
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (11): Player Name, Club, Nationality, Position, Conversion %, Passes%, C...
dbl (46): Appearances, Minutes, Goals, Assists, Shots, Shots On Target, Big ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(EPL_Player)
# A tibble: 6 × 57
  `Player Name` Club    Nationality Position Appearances Minutes Goals Assists
  <chr>         <chr>   <chr>       <chr>          <dbl>   <dbl> <dbl>   <dbl>
1 Ben White     Arsenal England     DEF               17    1198     0       2
2 Bukayo Saka   Arsenal England     MID               25    1735     6      10
3 David Raya    Arsenal Spain       GKP               38    3420     0       0
4 Declan Rice   Arsenal England     MID               35    2833     4       7
5 Ethan Nwaneri Arsenal England     MID               26     889     4       0
6 Gabriel Jesus Arsenal Brazil      FWD               17     603     3       0
# ℹ 49 more variables: Shots <dbl>, `Shots On Target` <dbl>,
#   `Conversion %` <chr>, `Big Chances Missed` <dbl>, `Hit Woodwork` <dbl>,
#   Offsides <dbl>, Touches <dbl>, Passes <dbl>, `Successful Passes` <dbl>,
#   `Passes%` <chr>, Crosses <dbl>, `Successful Crosses` <dbl>,
#   `Crosses %` <chr>, `fThird Passes` <dbl>, `Successful fThird Passes` <dbl>,
#   `fThird Passes %` <chr>, `Through Balls` <dbl>, Carries <dbl>,
#   `Progressive Carries` <dbl>, `Carries Ended with Goal` <dbl>, …
EPL_clean <- EPL_Player |>
  filter(
    !is.na(Goals),
    !is.na(Shots),
    !is.na(`Shots On Target`),
    !is.na(Assists),
    !is.na(Appearances),
    !is.na(Minutes)
  ) |>
  filter(Position != "GKP")
dim(EPL_clean)
[1] 517  57
fit1 <- lm(Goals ~ Shots + `Shots On Target` + Assists + Appearances + Minutes,
           data = EPL_clean)
summary(fit1)

Call:
lm(formula = Goals ~ Shots + `Shots On Target` + Assists + Appearances + 
    Minutes, data = EPL_clean)

Residuals:
     Min       1Q   Median       3Q      Max 
-14.9630  -0.4991  -0.0827   0.4889  13.3278 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)        0.1247336  0.2080039   0.600   0.5490    
Shots              0.1262031  0.0063015  20.027   <2e-16 ***
`Shots On Target` -0.0089056  0.0113460  -0.785   0.4329    
Assists            0.0210609  0.0543278   0.388   0.6984    
Appearances       -0.0385597  0.0204262  -1.888   0.0596 .  
Minutes            0.0001253  0.0002342   0.535   0.5928    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.154 on 511 degrees of freedom
Multiple R-squared:  0.6056,    Adjusted R-squared:  0.6017 
F-statistic: 156.9 on 5 and 511 DF,  p-value: < 2.2e-16
autoplot(fit1, 1:4, nrow = 2, ncol = 2)
Warning: `fortify(<lm>)` was deprecated in ggplot2 4.0.0.
ℹ Please use `broom::augment(<lm>)` instead.
ℹ The deprecated feature was likely used in the ggfortify package.
  Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
ℹ The deprecated feature was likely used in the ggfortify package.
  Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ℹ The deprecated feature was likely used in the ggfortify package.
  Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.

fit2 <- lm(Goals ~ Shots + `Shots On Target` + Assists + Appearances,
           data = EPL_clean)
summary(fit2)

Call:
lm(formula = Goals ~ Shots + `Shots On Target` + Assists + Appearances, 
    data = EPL_clean)

Residuals:
     Min       1Q   Median       3Q      Max 
-14.8911  -0.5132  -0.0583   0.4898  13.3509 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)        0.087730   0.196035   0.448  0.65469    
Shots              0.126304   0.006294  20.066  < 2e-16 ***
`Shots On Target` -0.007822   0.011156  -0.701  0.48355    
Assists            0.024025   0.054007   0.445  0.65661    
Appearances       -0.029435   0.011237  -2.619  0.00907 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.152 on 512 degrees of freedom
Multiple R-squared:  0.6054,    Adjusted R-squared:  0.6023 
F-statistic: 196.4 on 4 and 512 DF,  p-value: < 2.2e-16
autoplot(fit2, 1:4, nrow = 2, ncol = 2)

 fit3 <- lm(Goals ~ Shots + `Shots On Target` + Assists,
           data = EPL_clean)
summary(fit3)

Call:
lm(formula = Goals ~ Shots + `Shots On Target` + Assists, data = EPL_clean)

Residuals:
     Min       1Q   Median       3Q      Max 
-14.5224  -0.5926   0.0763   0.3520  14.1297 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)       -0.302436   0.128176  -2.360   0.0187 *  
Shots              0.118572   0.005591  21.208   <2e-16 ***
`Shots On Target` -0.017131   0.010635  -1.611   0.1078    
Assists            0.007518   0.053943   0.139   0.8892    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.164 on 513 degrees of freedom
Multiple R-squared:  0.6001,    Adjusted R-squared:  0.5977 
F-statistic: 256.6 on 3 and 513 DF,  p-value: < 2.2e-16
autoplot(fit3, 1:4, nrow = 2, ncol = 2)

I selected the top scorers in each team

top_scorers <- EPL_Player |>
  filter(Goals > 0) |>
  group_by(Club) |>
  slice_max(order_by = Goals, n = 1, with_ties = FALSE) |>
  ungroup() |>
  select(Club, `Player Name`, Nationality, Position, Goals) |>
  arrange(Goals)
head(top_scorers, 20)
# A tibble: 20 × 5
   Club                    `Player Name`        Nationality    Position Goals
   <chr>                   <chr>                <chr>          <chr>    <dbl>
 1 Ipswich Town            Sam Szmodics         Ireland        FWD          4
 2 Southampton             Paul Onuachu         Nigeria        FWD          4
 3 Bournemouth             Dango Ouattara       Burkina Faso   MID          7
 4 Manchester United       Amad Diallo          Cote D’Ivoire  MID          8
 5 Arsenal                 Kai Havertz          Germany        FWD          9
 6 Everton                 Iliman Ndiaye        Senegal        FWD          9
 7 Fulham                  Alex Iwobi           Nigeria        MID          9
 8 Leicester City          Jamie Vardy          England        FWD          9
 9 Tottenham Hotspur       Dominic Solanke      England        FWD          9
10 West Ham United         Tomás Soucek         Czech Republic MID          9
11 Brighton & Hove Albion  Danny Welbeck        England        FWD         10
12 Crystal Palace          Jean-Philippe Mateta France         FWD         14
13 Chelsea                 Cole Palmer          England        MID         15
14 Wolverhampton Wanderers Matheus Cunha        Brazil         MID         15
15 Aston Villa             Ollie Watkins        England        FWD         16
16 Brentford               Bryan Mbeumo         Cameroon       MID         20
17 Nottingham Forest       Chris Wood           New Zealand    FWD         20
18 Manchester City         Erling Haaland       Norway         FWD         22
19 Newcastle United        Alexander Isak       Sweden         FWD         23
20 Liverpool               Mohamed Salah        Egypt          MID         29

ggplot Graph

ggplot(top_scorers, aes(x = `Player Name`, y = Goals, fill = `Player Name`)) +
  geom_bar(stat = "identity",) +
  coord_flip() +
  labs(
    title = "Top Goal Scorer from Each Club",
    x = "Top Scoring Players",
    y = "Goals"
  ) +
    theme_minimal(base_size = 15)+
  theme(legend.position = "none")

bar_plot <- ggplot(top_scorers, aes(
    x = `Player Name`,
    y = Goals,
    fill = Position,
    text = paste("Player:", `Player Name`,
                 "<br>Club:", Club,
                 "<br>Position:", Position,
                 "<br>Goals:", Goals)
  )) +
  geom_bar(stat = "identity") +
  coord_flip() +
  scale_fill_manual(
    values = c("FWD" = "darkred", "MID" = "navyblue", "DEF" = "grey")
  ) +
  labs(
    title = "Top Goal Scorer from Each EPL Club (2024/25 Season)",
    x = "Player",
    y = "Goals Scored",
    fill = "Position",
    caption = "Source: Premier League Official Statistics (premierleague.com)"
  ) +
  theme_minimal(base_size = 15)
ggplotly(bar_plot, tooltip = "text")