This dataset provides detailed statistics of football players participating in the 2025 UEFA Champions League season. It includes a wide array of performance metrics, offering a comprehensive view of players’ skills, contributions, and performance throughout the tournament.
The research aims to explore the factors contributing most to a team’s success in the 2024 Premier League season, focusing on performance metrics like goals scored, wins, draws, losses, and points, and their correlation with final rankings. The analysis will consider key variables such as team name, goals scored, wins, draws, losses, points, and final rank. Statistical methods including correlation analysis and regression modeling will be employed to identify which factors are most strongly linked to a team’s rank and points. The findings will provide valuable insights into the strategies that lead to success and highlight key performance indicators for teams striving for a top finish.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("C:/Users/eyong/Downloads")
df <- read_csv("PremierLeagueSeason2024.csv")
## Rows: 24 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): team
## dbl (8): goals_scored, goals_conceded, wins, draws, losses, points, goal_dif...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_heatmap <- df[, c("team", "wins", "draws", "losses")]
df_heatmap <- df_heatmap %>% pivot_longer(cols = c("wins", "draws", "losses"), names_to = "type", values_to = "count")
df_heatmap <- df[, c("team", "wins", "draws", "losses", "goals_scored", "goals_conceded", "points")]
df_heatmap <- df_heatmap %>% pivot_longer(cols = c("wins", "draws", "losses", "goals_scored", "goals_conceded", "points"),
names_to = "type", values_to = "count")
ploting the preformance metric such as wins losses and draws using a heat map to show which teams have a better preformance based on thier metrics and we can see below
ggplot(df_heatmap, aes(x = team, y = type, fill = count)) +
geom_tile() +
scale_fill_gradient(low = "grey", high = "black") +
labs(title = "Premier League Team Results", x = "Team", y = "Type") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
From the heatmap, it’s clear that Manchester City has the highest number of wins,goals scored and points as indicated by the deeper the black color compared to the other teams, followed by Arsenal. Sheffield United has the most losses, while Brighton and Hove Albion have the highest number of draws. Therefore, it’s fair to say that if you had placed a bet on a parlay in the 2024 Premier League season, Manchester City and Arsenal would have been the safest picks for success
cor_win_goals <- cor(df$wins, df$goals_scored)
cor_win_goals
## [1] 0.9705855
the correlation between wins and draws is 0.9705855 which is a strong positive relationship
model_1 <- lm(goals_scored~wins,data = df)
summary(model_1)
##
## Call:
## lm(formula = goals_scored ~ wins, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.476 -7.445 2.766 5.841 21.524
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.6959 4.2787 5.538 1.45e-05 ***
## wins 2.8593 0.1512 18.909 4.29e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.1 on 22 degrees of freedom
## Multiple R-squared: 0.942, Adjusted R-squared: 0.9394
## F-statistic: 357.5 on 1 and 22 DF, p-value: 4.289e-15
The relationship between wins and goals scored is strong, with about 93.94% of the variation in wins explained by goals scored, as indicated by the high adjusted R-squared value. The p-value of 4.289e-15 is extremely small, well below the conventional threshold of 0.05, suggesting that the relationship is statistically significant. Therefore, there is strong evidence to conclude that draws are a major predictor of wins in this analysis.
ggplot(df, aes(x = goals_scored, y = wins)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "blue") +
labs(title = "Regression of Wins on goals scored",
x = "goals scored",
y = "wins") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
so if we predict if a team score 100 goals they are sure to win 25 matches according to the regression model
hist(residuals(model_1))
This histogram suggests that the residuals are approximately normally distributed with a bell-shaped curve, though slightly left-skewed, and exhibit an equal number of positive and negative outliers.
From the 2024 Premier League season, it is clear that the top teams on the leaderboard, Arsenal and Manchester City, achieved higher ranks due to their superior performance in goals, points, and wins, ultimately defining the season’s outcome