Comprehensive data from excel4soccer.com, offering detailed statistics for the 2023-24 Premier League season up to April 23, 2024.
if (!require("readxl")) install.packages("readxl")
if (!require("ggplot2")) install.packages("ggplot2")
if (!require("dplyr")) install.packages("dplyr")
if (!require("tidyr")) install.packages("tidyr")
if (!require("caret")) install.packages("caret")
if (!require("corrplot")) install.packages("corrplot")
if (!require("randomForest")) install.packages("randomForest")
library(readxl)
library(ggplot2)
library(dplyr)
library(tidyr)
library(caret)
library(corrplot)
library(randomForest)
player_stats <- read_excel("Premier League 2023-24 Stats.xlsx", sheet = "PlayerStatsExport")
team_stats <- read_excel("Premier League 2023-24 Stats.xlsx", sheet = "TeamStatsExport")
league_stats <- read_excel("Premier League 2023-24 Stats.xlsx", sheet = "LeagueTableExport")
lineup_stats <- read_excel("Premier League 2023-24 Stats.xlsx", sheet = "LineUpExport")
plays_stats <- read_excel("Premier League 2023-24 Stats.xlsx", sheet = "PlaysExport")
all_fixtures <- read_excel("Premier League 2023-24 Stats.xlsx", sheet = "AllFixturesExport")
league_stats %>%
select(Team, Points, GF, GA, GD, MP) %>%
arrange(Points, GD) %>%
ggplot(aes(x=reorder(Team, Points + GD/1000), y=Points, fill=Team)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label=paste("MP:", MP)), position=position_stack(vjust=0.5), size=3.5) +
geom_text(aes(label=Points, y=Points), hjust=-0.2, size=3.5) +
coord_flip() +
labs(title="Current League Standings", x="Team", y="Points") +
theme(axis.text.y = element_text(size=12),
plot.title = element_text(size=16))
Based on this chart depicting the current league standings, we can observe several insights:
Top of the Table: Arsenal and Liverpool are tied in points at the top of the league with 74 points each. However, Arsenal is ahead likely due to a superior goal difference or head-to-head record, as the chart only shows points. Both teams have played 33 matches.
Close Contenders: Manchester City is just one point behind the leaders with 73 points, and they have a game in hand, which could potentially take them to the top if they win it.
Chasing Pack: There’s a significant gap between the top three teams and the rest. Aston Villa sits in fourth with 66 points, followed by Tottenham Hotspur with 60 points. Both teams have played 34 and 32 matches, respectively, indicating that Spurs have games in hand that could improve their standing.
Mid-Table Battle: Teams like Newcastle United, Manchester United, West Ham United, and Chelsea are closely packed in the middle of the table, ranging from 50 to 47 points. This suggests a tight competition for the European qualification spots.
Relegation Zone: At the bottom, Sheffield United is trailing with only 16 points from 33 matches, which puts them in a precarious position for relegation. Burnley and Luton Town are just above them, with 23 and 25 points respectively, indicating a tough battle to avoid the drop.
Games Played (MP): It’s important to note the number of matches played (MP) as it affects the potential points a team can still earn. For example, Manchester City with 32 matches played has the potential to go top if they win their two games in hand over Arsenal and Liverpool.
league_stats %>%
select(Team, `Home GF`, `Away GF`) %>%
pivot_longer(cols = c(`Home GF`, `Away GF`), names_to = "Home_Away", values_to = "Goals") %>%
ggplot(aes(x=reorder(Team, Goals), y=Goals, fill=Home_Away)) +
geom_bar(stat="identity", position="dodge") +
labs(title="Home vs Away Goals for Each Team", x="Team", y="Goals") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
The bar chart provides a comparison of home versus away goals for each team. Here’s what we can gather from this visualization:
Higher Home Scoring: Generally, teams have scored more goals at home than away, which is a common trend in football due to the home advantage factor.
Top Performers: Arsenal, which is leading in the league standings, also has the highest number of home goals scored. However, they have a relatively lower number of away goals compared to their home tally.
Balanced Scoring: Liverpool and Manchester City, the other top teams, have a more balanced scoring record with only a slight variation between home and away goals, which could indicate strong performance regardless of the venue.
Struggling Offense: Sheffield United, at the bottom of the league standings, also reflects a poor scoring record both at home and away, which correlates with their league position.
Mid-Table Variability: Teams in the middle of the table show varying patterns, with some like West Ham United and Tottenham Hotspur scoring more away than at home, which might be indicative of their playing style or tactical approaches.
Potential for Analysis: This data could be useful for predictive analysis by correlating goal-scoring prowess at home and away with potential outcomes in remaining matches.
In terms of predictive insights:
Home Field Advantage: Teams with a strong home goal record might be more likely to win their remaining home fixtures.
Away Form: Teams like Tottenham Hotspur and West Ham United that have performed well away from home could be more likely to secure points in their upcoming away games.
Relegation Concerns: The lower goal tallies for teams at the bottom, especially away from home, might indicate they will struggle to pick up the necessary points to avoid relegation.
league_stats %>%
select(Team, MP, Win, Draw, Loss, GF, GA) %>%
mutate(GoalDifference = GF - GA, GoalDifferencePerMatch = GoalDifference / MP) %>%
arrange(desc(GoalDifferencePerMatch)) %>%
ggplot(aes(x=reorder(Team, GoalDifferencePerMatch), y=GoalDifferencePerMatch, fill=Team)) +
geom_col() +
coord_flip() +
labs(title="Team Performance by Goal Difference per Match", x="Team", y="Goal Difference per Match") +
theme(legend.position = "none")
This bar chart presents each Premier League team’s performance in terms of goal difference per match for the season. The goal difference per match is a good indicator of a team’s overall performance, as it accounts for both offensive and defensive capabilities. Here are some insights:
Leading Teams: Arsenal and Manchester City are at the top, which aligns with their standing in the league table. A higher goal difference per match suggests they consistently outscore their opponents, a key factor in their success.
Competitive Mid-Table: There is a cluster of teams with a goal difference per match close to zero. This suggests that many matches involving these teams are closely contested, with teams either winning by a small margin or drawing frequently.
Struggling at the Bottom: Sheffield United has a negative goal difference per match, indicating they concede more goals than they score on average, contributing to their position at the bottom of the table.
Consistency in Performance: The chart likely reflects consistency over the season, as teams with a higher goal difference per match have been consistently better at both scoring goals and preventing them.
Predictive Insights: Teams with a positive and high goal difference per match would be expected to continue performing well. Conversely, teams with a low or negative goal difference per match might struggle unless they can improve either their offense or defense (or both).
team_stats %>%
group_by(Team) %>%
summarise(TotalFouls = sum(foulsCommitted),
YellowCards = sum(yellowCards),
RedCards = sum(redCards)) %>%
gather(key = "Type", value = "Count", -Team) %>%
ggplot(aes(x = Team, y = Count, fill = Type)) +
geom_bar(stat = "identity", position = position_dodge()) +
coord_flip() +
labs(title = "Total Fouls and Cards by Team", x = "Team", y = "Total Count") +
theme(legend.position = "bottom")
The bar chart shows the total number of fouls committed and cards received (both yellow and red) by each team in the Premier League. Here are the insights:
Disciplinary Issues: Teams with a high number of fouls and cards might have disciplinary issues or a more aggressive playing style. This could lead to suspensions and a weakened squad in subsequent matches.
Top of the Discipline Chart: The team with the highest number of fouls committed is not necessarily the one with the most cards, which suggests that while they may commit many fouls, they may not all be of a serious nature.
Fouls vs. Cards: Some teams have a relatively high number of fouls but fewer cards, which could indicate effective management of fouls during the game, avoiding bookings.
Red Cards Impact: Red cards can significantly impact team performance as they lead to suspensions and playing with fewer players during a match. Teams with more red cards might be at a tactical disadvantage in some games.
Predictive Insight: The teams with higher fouls and card counts might be at risk in future games, especially if key players are prone to receiving cards. This could affect team selection and strategy.
Fair Play: Teams with fewer fouls and cards may be seen as playing a cleaner game, which can be advantageous as it reduces the risk of suspensions and penalties.
player_stats %>%
select(Name, Team, Position, totalGoals, goalAssists, shotsOnTarget, totalShots, foulsCommitted, yellowCards) %>%
arrange(desc(totalGoals)) %>%
head(10) %>%
ggplot(aes(x=reorder(Name, totalGoals), y=totalGoals, fill=Position)) +
geom_col(show.legend = TRUE) +
geom_text(aes(label=Team), position=position_stack(vjust=0.5), color="white", size=3.5) +
coord_flip() +
labs(title="Top Scorers in the Premier League", x="Player", y="Total Goals")
The bar chart showcases the top scorers in the Premier League for the current season, along with their position (either forward or midfielder) and the total number of goals scored. Here’s what we can deduce:
Erling Haaland: A forward from Manchester City, is leading the chart with the highest number of goals.
Positional Diversity: Both forwards and midfielders are among the top scorers, indicating that goal-scoring is not limited to traditional striker roles.
Team Representation: Manchester City has two players among the top scorers, suggesting a strong offensive team.
Midfield Contributions: Notably, a midfielder from Chelsea is high up on the list, highlighting the importance of goals from midfield positions.
Player Impact: Players like Mohamed Salah from Liverpool continue to have a significant impact, as indicated by their goal tally.
Strategic Insight: The distribution of goals among different players and positions can offer insights into team strategies. Teams with multiple players on the list may have a diversified attack, making them less predictable and more difficult to defend against.
Predictive Analysis: This chart can be used for predictive analysis in terms of future match outcomes. Teams with top scorers are likely to continue finding the back of the net, which could influence the outcomes of their upcoming fixtures.
Scoring Patterns: Analysing the scoring patterns of these players, including the type of goals and the match situations in which they score, can provide deeper insights into their effectiveness and consistency.
top_teams <- league_stats %>%
arrange(desc(Points)) %>%
slice(1:3) %>%
pull(Team)
top_team_stats <- team_stats %>%
filter(Team %in% top_teams)
ggplot(data = top_team_stats, aes(x = `Date Time (US Eastern)`, y = `Home Goal` + `Away Goal`, group = Team, color = Team)) +
geom_line() +
labs(title = "Team Performance Over the Season for Top 3 Teams", x = "Date", y = "Total Goals Scored")
The line graph shows the total number of goals scored by the top three teams in the Premier League over the season, with data points likely representing each match played. Here’s the analysis:
Goal Fluctuations: All three teams show fluctuations in their goal-scoring throughout the season, which is normal due to varying levels of competition and form.
Consistent Performers: Manchester City seems to have the most consistent goal-scoring performance, with fewer dips and peaks compared to Arsenal and Liverpool.
Scoring Peaks: Each team has moments where they have scored a high number of goals in a match, which could coincide with playing against lower-ranked teams or particularly strong performances.
Dips in Form: The dips, where the goal count is low or zero, could correspond to playing stronger defenses, away games, or possible off-days in terms of performance.
End of Season Performance: There is a notable increase in goals scored as the season progresses into April for Manchester City, which could be a critical factor in their challenge for the title.
Comparison of the Top Teams: The comparative analysis between the teams can provide insights into their chances of winning the league based on consistency and peak performance.
Predictive Potential: The trajectory of the lines towards the most recent dates can help predict which team might end the season strongly. Teams showing an upward trend in goal-scoring as the season closes may have a better chance of finishing at the top.
Goal Scoring as an Indicator: While goal-scoring is a crucial part of winning matches, it is not the sole indicator of success. Defensive records would also need to be considered for a comprehensive understanding of overall team performance.
player_stats %>%
select(totalGoals, goalAssists, shotsOnTarget, totalShots, foulsCommitted, yellowCards, redCards) %>%
cor(use = "complete.obs") %>%
corrplot::corrplot(method = "circle", type = "upper", order = "hclust",
tl.col = "black", tl.srt = 25,
title = "Correlation of Player Performance Metrics")
Positive Correlations: Larger circles indicate stronger correlations. For example, a strong positive correlation exists between totalGoals and shotsOnTarget, suggesting that as shots on target increase, total goals typically increase too.
Negative Correlations: Smaller or differently colored circles (like red) suggest negative correlations. RedCards may negatively correlate with goalAssists or totalGoals.
No Correlation: Very small or absent circles denote little to no correlation between variables.
Performance Metrics: Metrics such as totalGoals, goalAssists, and shotsOnTarget are positively correlated, reflecting their relation to offensive actions.
Disciplinary Actions: Metrics like foulsCommitted, yellowCards, and redCards are related since fouls increase the likelihood of receiving a card.
Applications: This corrplot can help in predictive modeling and strategic planning in sports analytics, aiding in identifying player performance trends and team strategy adjustments.
league_stats %>%
select(Points, MP, Win, Draw, Loss, GF, GA, GD, `Clean Sheets`) %>%
summary() %>%
knitr::kable(caption = "League Performance Metrics")
Points | MP | Win | Draw | Loss | GF | GA | GD | Clean Sheets | |
---|---|---|---|---|---|---|---|---|---|
Min. :16.00 | Min. :31 | Min. : 3.00 | Min. : 5.00 | Min. : 3.00 | Min. :31.0 | Min. :26.00 | Min. :-57.00 | Min. : 1.00 | |
1st Qu.:33.75 | 1st Qu.:32 | 1st Qu.: 9.00 | 1st Qu.: 6.00 | 1st Qu.: 9.50 | 1st Qu.:45.0 | 1st Qu.:48.75 | 1st Qu.:-14.00 | 1st Qu.: 3.75 | |
Median :43.50 | Median :33 | Median :12.00 | Median : 7.50 | Median :12.50 | Median :51.0 | Median :52.50 | Median : -5.50 | Median : 6.50 | |
Mean :45.20 | Mean :33 | Mean :12.80 | Mean : 7.40 | Mean :12.80 | Mean :53.8 | Mean :53.80 | Mean : 0.00 | Mean : 6.20 | |
3rd Qu.:52.50 | 3rd Qu.:34 | 3rd Qu.:15.75 | 3rd Qu.: 8.25 | 3rd Qu.:16.25 | 3rd Qu.:66.0 | 3rd Qu.:60.00 | 3rd Qu.: 16.25 | 3rd Qu.: 8.25 | |
Max. :74.00 | Max. :34 | Max. :23.00 | Max. :11.00 | Max. :23.00 | Max. :77.0 | Max. :88.00 | Max. : 51.00 | Max. :14.00 |
Here are some notable points from the league performance metrics that could provide interesting insights:
Top Performers: The maximum points a team has accumulated is 74, with the top teams managing up to 23 wins, indicating a strong lead by the frontrunners.
Tight Competition: The median points value is 43.5, which shows that the league is quite competitive around the mid-table, with half the teams within a close range of each other.
Striking Goal Differences: There’s a substantial range in goal difference, from a low of -57 to a high of +51, highlighting the disparity between the strongest and weakest teams in terms of offensive and defensive performance.
Defensive Strengths: The best defensive record boasts a mere 26 goals conceded and up to 14 clean sheets, showcasing exceptional defensive capabilities.
team_stats %>%
select(possessionPct, totalShots, shotsOnTarget, wonCorners, foulsCommitted, accuratePasses, accurateCrosses, effectiveTackles ) %>%
summary() %>%
knitr::kable(caption = "Key Team Statistics (each game)")
possessionPct | totalShots | shotsOnTarget | wonCorners | foulsCommitted | accuratePasses | accurateCrosses | effectiveTackles | |
---|---|---|---|---|---|---|---|---|
Min. :17.10 | Min. : 2.00 | Min. : 0.00 | Min. : 0.000 | Min. : 2.0 | Min. : 82.0 | Min. : 0.000 | Min. : 2.00 | |
1st Qu.:38.88 | 1st Qu.: 9.00 | 1st Qu.: 3.00 | 1st Qu.: 3.000 | 1st Qu.: 9.0 | 1st Qu.:272.8 | 1st Qu.: 2.000 | 1st Qu.: 8.00 | |
Median :50.00 | Median :13.00 | Median : 4.00 | Median : 5.000 | Median :11.0 | Median :371.0 | Median : 4.000 | Median :10.00 | |
Mean :50.00 | Mean :13.66 | Mean : 4.88 | Mean : 5.441 | Mean :11.1 | Mean :388.2 | Mean : 4.174 | Mean :10.33 | |
3rd Qu.:61.12 | 3rd Qu.:17.00 | 3rd Qu.: 6.00 | 3rd Qu.: 7.000 | 3rd Qu.:13.0 | 3rd Qu.:488.0 | 3rd Qu.: 6.000 | 3rd Qu.:13.00 | |
Max. :82.90 | Max. :37.00 | Max. :15.00 | Max. :17.000 | Max. :23.0 | Max. :944.0 | Max. :19.000 | Max. :26.00 |
Here’s a summary of the key per-game team statistics in the league that stand out:
Possession Extremes: The range of possession percentages shows a stark contrast between teams, with the minimum at 17.1% and the maximum at 82.9%, highlighting the difference in play styles.
Shooting Prowess: There’s a team that managed a remarkable 37 total shots in a game, with another team achieving 15 shots on target, underscoring some exceptional attacking displays.
Corner Battles: One team won up to 17 corners in a single game, indicating dominance in attacking flanks or persistent pressure on the opposition.
Passing Accuracy: On the higher end, a team completed up to 944 accurate passes, suggesting a highly possession-oriented strategy.
Defensive Actions: The maximum of 26 effective tackles in a game showcases a strong defensive effort, potentially against a very attacking team.
player_stats %>%
select(`Total Play Time(min)`, Appearances, totalGoals, goalAssists, shotsOnTarget, totalShots, foulsCommitted, yellowCards, redCards) %>%
summary() %>%
knitr::kable(caption = "Player Performance Metrics")
Total Play Time(min) | Appearances | totalGoals | goalAssists | shotsOnTarget | totalShots | foulsCommitted | yellowCards | redCards | |
---|---|---|---|---|---|---|---|---|---|
Min. : 0.00 | Min. : 0.00 | Min. : 0.000 | Min. : 0.000 | Min. : 0.000 | Min. : 0.00 | Min. : 0.000 | Min. : 0.000 | Min. :0.00000 | |
1st Qu.: 12.75 | 1st Qu.: 1.00 | 1st Qu.: 0.000 | 1st Qu.: 0.000 | 1st Qu.: 0.000 | 1st Qu.: 0.00 | 1st Qu.: 0.000 | 1st Qu.: 0.000 | 1st Qu.:0.00000 | |
Median : 741.00 | Median :13.00 | Median : 0.000 | Median : 0.000 | Median : 1.000 | Median : 4.00 | Median : 5.000 | Median : 1.000 | Median :0.00000 | |
Mean : 992.59 | Mean :13.83 | Mean : 1.411 | Mean : 1.044 | Mean : 4.057 | Mean : 12.07 | Mean : 9.641 | Mean : 1.967 | Mean :0.07104 | |
3rd Qu.:1757.75 | 3rd Qu.:25.00 | 3rd Qu.: 1.250 | 3rd Qu.: 1.000 | 3rd Qu.: 5.000 | 3rd Qu.: 16.00 | 3rd Qu.:16.000 | 3rd Qu.: 3.000 | 3rd Qu.:0.00000 | |
Max. :3337.00 | Max. :34.00 | Max. :20.000 | Max. :13.000 | Max. :44.000 | Max. :100.00 | Max. :64.000 | Max. :13.000 | Max. :2.00000 |
Certain statistics stand out and can be intriguing discussion points:
Playing Time: The vast range in ‘Total Play Time’ from 0 to 3337 minutes reflects the varying roles players have in their teams, from unused substitutes to ever-presents.
Goal Contributions: A player has made a significant impact with a maximum of 20 goals and 13 assists, showcasing standout offensive talent.
Shooting Accuracy: The highest number of shots on target stands at 44, pointing to a player with an exceptional ability to test goalkeepers.
Discipline: On the discipline side, one player has accumulated as many as 13 yellow cards, and another has 2 red cards, which might indicate a more aggressive style of play or poor discipline.
Goal Attempts: A high number of total shots, maxing out at 100, indicates players who are very active in attempting to score.
player_stats %>%
filter(totalGoals >= 5) %>%
mutate(ShootingAccuracy = shotsOnTarget / totalShots,
GoalConversion = totalGoals / totalShots,
AssistPerGame = goalAssists / Appearances) %>%
select(Name, ShootingAccuracy, GoalConversion, AssistPerGame) %>%
drop_na(ShootingAccuracy, GoalConversion, AssistPerGame) %>%
arrange(desc(GoalConversion)) %>%
head(10) %>%
knitr::kable(caption = "Top 10 Players by Goal Conversion Rate (Minimum 5 Goals)")
Name | ShootingAccuracy | GoalConversion | AssistPerGame |
---|---|---|---|
Chris Wood | 0.5757576 | 0.3636364 | 0.0370370 |
Elijah Adebayo | 0.4285714 | 0.3214286 | 0.0000000 |
Alexander Isak | 0.5423729 | 0.2881356 | 0.0416667 |
Taiwo Awoniyi | 0.4761905 | 0.2857143 | 0.1764706 |
Diogo Jota | 0.4864865 | 0.2702703 | 0.1904762 |
Hwang Hee-Chan | 0.3243243 | 0.2702703 | 0.1250000 |
Jean-Philippe Mateta | 0.5294118 | 0.2647059 | 0.1333333 |
Cole Palmer | 0.4444444 | 0.2469136 | 0.3214286 |
Rasmus Højlund | 0.5172414 | 0.2413793 | 0.0833333 |
Callum Wilson | 0.5666667 | 0.2333333 | 0.0625000 |
These top players are not just scoring; they’re doing so with remarkable efficiency and contributing to their team’s overall attack:
Chris Wood stands out not just for his goal conversion but also for his impressive shooting accuracy, making him a critical asset in front of goal.
Elijah Adebayo shows significant efficiency with a conversion rate of over 32%, which is remarkable for a forward, especially in a high-pressure league.
Alexander Isak, Taiwo Awoniyi, and Diogo Jota balance their roles well between scoring and assisting, which makes them versatile threats to any defense.
Hwang Hee-Chan and Jean-Philippe Mateta demonstrate that even with lower shooting accuracy, effective finishing can make a substantial impact, as reflected in their goal conversion rates.
Cole Palmer’s high assist rate coupled with his goal-scoring ability makes him an exceptional talent, indicating his capability to influence the game both by scoring and setting up goals.
Rasmus Højlund and Callum Wilson maintain high shooting accuracies and respectable goal conversions, underscoring their precision and clinical nature in front of goal.
player_stats %>%
filter(Position != "Goalkeeper", Appearances >= 10) %>%
mutate(
PlayTimePerGame = `Total Play Time(min)` / Appearances,
ImpactScore = (totalGoals + goalAssists + foulsSuffered - foulsCommitted) / Appearances
) %>%
select(Name, Team, Position, PlayTimePerGame, ImpactScore) %>%
drop_na(PlayTimePerGame, ImpactScore) %>%
arrange(desc(ImpactScore), desc(PlayTimePerGame)) %>%
head(10) %>%
knitr::kable(caption = "Top 10 Outfield Players by Play Time Per Game and Revised Impact Score (Min 10 Appearances)")
Name | Team | Position | PlayTimePerGame | ImpactScore |
---|---|---|---|---|
Michael Olise | Crystal Palace | Midfielder | 70.78571 | 2.500000 |
James Maddison | Tottenham Hotspur | Midfielder | 82.22727 | 2.454546 |
Jack Grealish | Manchester City | Midfielder | 60.41176 | 2.117647 |
Bruno Guimarães | Newcastle United | Midfielder | 94.80645 | 1.967742 |
Phil Foden | Manchester City | Midfielder | 87.13333 | 1.966667 |
Jordan Ayew | Crystal Palace | Forward | 82.38710 | 1.935484 |
João Pedro | Brighton & Hove Albion | Forward | 66.50000 | 1.846154 |
Mohammed Kudus | West Ham United | Midfielder | 79.79310 | 1.689655 |
Bukayo Saka | Arsenal | Forward | 88.45161 | 1.677419 |
Chiedozie Ogbene | Luton Town | Forward | 74.28571 | 1.642857 |
These players are key contributors for their teams, combining playtime with high on-field impact:
Michael Olise’s high impact score with relatively lower playtime per game illustrates his ability to make significant contributions efficiently within limited minutes.
James Maddison not only plays a lot of minutes but also impacts games deeply, which speaks volumes about his fitness and technical skill.
Jack Grealish, known for his dribbling and ability to draw fouls, uses his playtime effectively to impact the game’s flow and create opportunities.
Bruno Guimarães showcases not just stamina but also a high level of involvement in playmaking and defensive actions, reflecting his comprehensive midfield capabilities.
Phil Foden and Jordan Ayew represent key offensive elements for their teams, with Foden known for his agility and precise finishing, while Ayew contributes with both goals and assists.
João Pedro and Mohammed Kudus highlight the role of dynamic forwards who can change the course of a game with their direct involvement in goal-scoring opportunities.
player_stats %>%
filter(Position == "Goalkeeper") %>%
select(Name, Team, shotsFaced, saves) %>%
mutate(SaveRate = saves / shotsFaced) %>%
drop_na(shotsFaced, saves) %>%
arrange(desc(saves)) %>%
head(10) %>%
knitr::kable(caption = "Goalkeeper Performance")
Name | Team | shotsFaced | saves | SaveRate |
---|---|---|---|---|
Thomas Kaminski | Luton Town | 493 | 135 | 0.2738337 |
André Onana | Manchester United | 462 | 124 | 0.2683983 |
Bernd Leno | Fulham | 402 | 122 | 0.3034826 |
Alphonse Areola | West Ham United | 349 | 112 | 0.3209169 |
Neto | AFC Bournemouth | 366 | 111 | 0.3032787 |
Wes Foderingham | Sheffield United | 362 | 110 | 0.3038674 |
Mark Flekken | Brentford | 383 | 107 | 0.2793734 |
James Trafford | Burnley | 361 | 107 | 0.2963989 |
José Sá | Wolverhampton Wanderers | 336 | 106 | 0.3154762 |
Jordan Pickford | Everton | 403 | 101 | 0.2506203 |
These goalkeepers highlight a blend of shot-stopping ability and consistency, crucial for their teams’ defensive strength:
Thomas Kaminski (Luton Town) stands out with the highest number of shots faced (493) and saves made (135). Although his save rate is relatively lower, the volume underscores his critical role in a team facing frequent attacks.
Bernd Leno (Fulham) and Alphonse Areola (West Ham United) both demonstrate notable efficiency with save rates exceeding 30%. Leno’s ability to maintain high performance despite a large number of shots highlights his exceptional skill and importance to Fulham.
André Onana (Manchester United), with 462 shots faced and a save rate around 27%, reflects significant defensive workload and solid goalkeeping under pressure, crucial for a team with high expectations.
José Sá (Wolverhampton Wanderers) shows impressive efficiency with a save rate of over 31%, making him one of the more effective goalkeepers, especially given he faced fewer shots compared to others in the top ten.
Jordan Pickford (Everton), facing over 400 shots but with the lowest save rate (about 25%), might indicate a challenging season for Everton’s defense, spotlighting his role as a frequently tested goalkeeper in the league.
team_stats %>%
group_by(Team) %>%
summarise(ShotsPerGame = mean(totalShots),
ShootingAccuracy = sum(shotsOnTarget) / sum(totalShots)) %>%
arrange(desc(ShotsPerGame), desc(ShootingAccuracy)) %>%
knitr::kable(caption = "Team Offensive Performance Metrics")
Team | ShotsPerGame | ShootingAccuracy |
---|---|---|
Liverpool | 20.121212 | 0.3433735 |
Manchester City | 18.625000 | 0.3842282 |
Arsenal | 16.939394 | 0.3506261 |
Tottenham Hotspur | 15.258064 | 0.3678647 |
Brighton & Hove Albion | 15.062500 | 0.3879668 |
Chelsea | 14.129032 | 0.3949772 |
Newcastle United | 14.062500 | 0.3800000 |
AFC Bournemouth | 14.060606 | 0.3491379 |
Manchester United | 14.031250 | 0.3429844 |
Aston Villa | 14.029412 | 0.3752621 |
Fulham | 13.529412 | 0.3608696 |
Everton | 13.424242 | 0.3250564 |
Brentford | 12.794118 | 0.3563218 |
West Ham United | 11.735294 | 0.3358396 |
Wolverhampton Wanderers | 11.545454 | 0.3648294 |
Nottingham Forest | 11.424242 | 0.3342175 |
Crystal Palace | 11.393939 | 0.3617021 |
Luton Town | 11.272727 | 0.3172043 |
Burnley | 10.909091 | 0.3250000 |
Sheffield United | 9.212121 | 0.3717105 |
These insights not only show how teams perform but also hint at their tactical setups and effectiveness in turning shots into goals, crucial for understanding their potential for success in the league:
High Volume vs. High Efficiency: Liverpool leads in shots per game, suggesting an aggressive offensive strategy. However, Manchester City, while taking fewer shots, has the highest shooting accuracy among the top teams. This contrast highlights different tactical approaches: Liverpool focuses on volume, while Manchester City emphasizes precision.
Shooting Accuracy Leaders: Chelsea and Brighton & Hove Albion have the highest shooting accuracies (39.50% and 38.80%, respectively), which is impressive given that they don’t lead in shots per game. This suggests that both teams are particularly effective at creating quality chances rather than a larger quantity of less promising attempts.
Mid-Table Efficiency: Despite being a mid-table team in terms of shots per game, Brighton’s shooting accuracy rivals that of the top teams. This indicates that while they may not attack as frequently, they make their opportunities count, which could be a critical factor in tight matches.
Consistency Across Metrics: Manchester City and Arsenal exhibit a strong combination of both high shots per game and above-average shooting accuracies. This balance likely contributes significantly to their success in the league, as it implies they consistently create and convert scoring opportunities.
Underperformers: At the lower end, Sheffield United has the lowest shots per game (9.21) but maintains a relatively high shooting accuracy (37.17%). This could suggest a cautious approach to offense, opting for quality over quantity, but it might also indicate a struggle to create scoring opportunities.
team_stats %>%
group_by(Team) %>%
summarise(AveragePossession = mean(possessionPct),
PassingAccuracy = mean(passPct)) %>%
arrange(desc(AveragePossession), desc(PassingAccuracy)) %>%
knitr::kable(caption = "Team Possession and Passing Metrics")
Team | AveragePossession | PassingAccuracy |
---|---|---|
Manchester City | 66.08437 | 0.8937500 |
Tottenham Hotspur | 61.92581 | 0.8741935 |
Brighton & Hove Albion | 61.74063 | 0.8875000 |
Liverpool | 61.31515 | 0.8454545 |
Arsenal | 59.60000 | 0.8606061 |
Chelsea | 59.11613 | 0.8709677 |
Aston Villa | 54.75000 | 0.8588235 |
Newcastle United | 51.92500 | 0.8250000 |
Fulham | 50.93529 | 0.8176471 |
Manchester United | 50.00000 | 0.8312500 |
Wolverhampton Wanderers | 48.48788 | 0.8121212 |
Burnley | 47.11818 | 0.7727273 |
AFC Bournemouth | 44.75455 | 0.7636364 |
Brentford | 44.07941 | 0.7382353 |
Crystal Palace | 41.55455 | 0.7757576 |
Luton Town | 41.38788 | 0.7393939 |
West Ham United | 41.25882 | 0.7794118 |
Nottingham Forest | 41.04848 | 0.7696970 |
Everton | 40.26364 | 0.7454545 |
Sheffield United | 35.10303 | 0.6969697 |
These metrics not only reflect the tactical foundations of each team but also highlight the correlation between possession, passing accuracy, and likely success in the league standings:
Manchester City’s Dominance: Manchester City leads with the highest average possession (66.08%) and passing accuracy (89.38%), illustrating their control over games and exceptional technical skill in retaining the ball and accurately distributing it.
High Possession, High Precision: Tottenham Hotspur and Brighton & Hove Albion also show strong possession stats, coupled with high passing accuracies. This indicates a style focused on ball control and effective playmaking, with Brighton’s accuracy (88.75%) nearly rivaling that of the top teams.
Top Four Cohesion: Liverpool and Arsenal, both with over 59% possession and above 84% passing accuracy, underscore their strategies of controlling games through sustained possession and robust passing networks, essential for their attacking approaches.
Contrast in Styles: Teams like Newcastle United and Manchester United have moderate possession stats (around 51%), but their lower passing accuracies suggest a possibly more direct or transitional style of play compared to the top possession teams.
Struggles at the Bottom: Sheffield United exhibits the lowest average possession (35.10%) and passing accuracy (69.70%), highlighting potential areas for improvement in ball control and retention to enhance their competitive edge.
team_stats %>%
group_by(Team) %>%
summarise(
TotalBlockedShots = sum(blockedShots),
TackleSuccessRate = mean(tacklePct),
TotalEffectiveClearance = sum(effectiveClearance)
) %>%
arrange(desc(TotalBlockedShots)) %>%
knitr::kable(caption = " Team Defensive Performance Metrics")
Team | TotalBlockedShots | TackleSuccessRate | TotalEffectiveClearance |
---|---|---|---|
Liverpool | 188 | 0.6090909 | 502 |
Manchester City | 186 | 0.6156250 | 371 |
Arsenal | 182 | 0.5969697 | 409 |
Manchester United | 141 | 0.6093750 | 725 |
Brighton & Hove Albion | 137 | 0.5906250 | 476 |
Aston Villa | 136 | 0.5823529 | 502 |
AFC Bournemouth | 135 | 0.6030303 | 682 |
Tottenham Hotspur | 135 | 0.6193548 | 563 |
Newcastle United | 121 | 0.5968750 | 565 |
Fulham | 120 | 0.5882353 | 629 |
Luton Town | 120 | 0.5757576 | 775 |
Chelsea | 113 | 0.5838710 | 573 |
Crystal Palace | 110 | 0.5666667 | 783 |
Brentford | 109 | 0.5911765 | 789 |
Everton | 109 | 0.5909091 | 693 |
Nottingham Forest | 103 | 0.5818182 | 800 |
West Ham United | 101 | 0.6235294 | 767 |
Burnley | 98 | 0.5727273 | 716 |
Wolverhampton Wanderers | 95 | 0.5848485 | 705 |
Sheffield United | 85 | 0.6000000 | 904 |
These metrics reflect the various defensive approaches across the league, showing which teams excel in specific defensive actions and how they manage to oppose their opponents’ offensive efforts:
High Block Count: Liverpool leads with the highest total of blocked shots at 188, closely followed by Manchester City at 186 and Arsenal at 182. This indicates these teams’ strong defensive positioning and their ability to disrupt opponents’ attacks effectively.
Effective Clearances: Sheffield United, despite having the lowest blocked shots, tops the chart in effective clearances with a total of 904. This suggests a defensive strategy heavily reliant on clearing the ball from dangerous areas, potentially due to facing frequent attacking pressure.
Tackle Success: West Ham United boasts the highest tackle success rate at about 62.35%, showing a proficiency in winning back possession through tackles. Tottenham Hotspur also shows high tackle efficiency, tying into their overall defensive strategy.
Comprehensive Defense: Manchester United, although not leading in blocked shots, has made the most substantial number of effective clearances (725), indicating a robust defensive setup that effectively reduces goal threats by clearing the ball from critical areas.
Defensive Disparity: Comparatively, teams like Burnley and Wolverhampton Wanderers, with fewer blocked shots and lower tackle success rates, might be focusing on other defensive tactics or may need to enhance their blocking and tackling capabilities to prevent opponents’ chances.
set.seed(123)
# Prepare data
league_model_data <- league_stats %>%
select(Team, Points, MP, GF, GA, GD, `Home Win`, `Away Win`, `Home Loss`, `Away Loss`) %>%
na.omit()
# Split data into training and testing sets
train_indices <- createDataPartition(league_model_data$Points, p=0.8, list=FALSE)
train_data <- league_model_data[train_indices, ]
test_data <- league_model_data[-train_indices, ]
# Fit linear regression model
linear_model <- lm(Points ~ MP + GF + GA + GD + `Home Win` + `Away Win` + `Home Loss` + `Away Loss`, data=train_data)
summary(linear_model)
##
## Call:
## lm(formula = Points ~ MP + GF + GA + GD + `Home Win` + `Away Win` +
## `Home Loss` + `Away Loss`, data = train_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.57560 -0.28969 0.07412 0.45389 1.00709
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.22994 12.88075 0.484 0.641586
## MP 0.60018 0.41415 1.449 0.185322
## GF 0.02123 0.06138 0.346 0.738372
## GA 0.05440 0.05948 0.915 0.387141
## GD NA NA NA NA
## `Home Win` 2.07277 0.27140 7.637 6.09e-05 ***
## `Away Win` 2.27171 0.34468 6.591 0.000171 ***
## `Home Loss` -0.86606 0.26999 -3.208 0.012466 *
## `Away Loss` -1.00557 0.34203 -2.940 0.018710 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.135 on 8 degrees of freedom
## Multiple R-squared: 0.998, Adjusted R-squared: 0.9963
## F-statistic: 571.7 on 7 and 8 DF, p-value: 3.703e-10
In the predictive modeling setup using linear regression, the analysis focuses on how well various team performance metrics predict the total points accumulated by teams in the league. The data preparation involved selecting relevant variables such as total points, matches played (MP), goals for (GF), goals against (GA), goal difference (GD), and wins and losses at home and away. This dataset was then split into training and testing sets for model training.
The linear regression model trained on the data identifies significant relationships between points and several variables:
The variable GD (Goal Difference) was excluded from the model due to singularity issues, because goal difference is a linear combination of goals for and goals against, making it redundant in the presence of those other variables.
Statistical significance is evident as the p-values for Home Win, Away Win, Home Loss, and Away Loss are well below the 0.05 threshold, indicating that these are statistically significant predictors of total points.
Model fit is excellent:
# Predicting on test data
predictions <- predict(linear_model, newdata=test_data)
# Create a data frame with actual points and predicted points
results <- data.frame(
Team = test_data$Team,
ActualPoints = test_data$Points,
PredictedPoints = predictions
)
# Print the results table
print(results)
## Team ActualPoints PredictedPoints
## 1 Tottenham Hotspur 60 60.49840
## 2 West Ham United 48 48.03975
## 3 Crystal Palace 36 35.31792
## 4 Everton 30 36.98456
The results table shows the actual and predicted points for the test data, providing an assessment of the predictive model’s accuracy when applied to new, unseen data. Here’s a summary of the model’s predictions compared to the actual outcomes:
library(ggplot2)
ggplot(results, aes(x = ActualPoints, y = PredictedPoints, color = Team)) +
geom_point(size = 3) +
geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "red") +
labs(title = "Comparison of Predicted Points vs. Actual Points Including Teams",
x = "Actual Points",
y = "Predicted Points") +
theme_minimal() +
theme(legend.position = "bottom")
The comparison between the predicted points and actual points for several teams is visualized. The dashed red line represents the line of perfect prediction, where the predicted points equal the actual points.
Key observations from the plot are:
# Calculate the average points per game so far
league_stats <- league_stats %>%
mutate(AvgPointsPerGame = Points / MP)
# Calculate remaining matches
league_stats <- league_stats %>%
mutate(RemainingMatches = 38 - MP)
# Predict future points based on average points per game
league_stats <- league_stats %>%
mutate(PredictedPoints = AvgPointsPerGame * RemainingMatches)
# Calculate final predicted points
league_stats <- league_stats %>%
mutate(FinalPoints = Points + PredictedPoints)
# Order by the predicted final points
league_stats <- league_stats %>%
arrange(desc(FinalPoints), desc(GD), desc(GF))
# Display the results
print(league_stats %>% select(Team, Points, RemainingMatches,PredictedPoints, FinalPoints))
## # A tibble: 20 × 5
## Team Points RemainingMatches PredictedPoints FinalPoints
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Manchester City 73 6 13.7 86.7
## 2 Arsenal 74 5 11.2 85.2
## 3 Liverpool 74 5 11.2 85.2
## 4 Aston Villa 66 4 7.76 73.8
## 5 Tottenham Hotspur 60 6 11.2 71.2
## 6 Newcastle United 50 6 9.38 59.4
## 7 Manchester United 50 6 9.38 59.4
## 8 Chelsea 47 7 10.6 57.6
## 9 West Ham United 48 4 5.65 53.6
## 10 Brighton & Hove Albion 44 6 8.25 52.2
## 11 Wolverhampton Wanderers 43 5 6.52 49.5
## 12 AFC Bournemouth 42 5 6.36 48.4
## 13 Fulham 42 4 4.94 46.9
## 14 Crystal Palace 36 5 5.45 41.5
## 15 Brentford 35 4 4.12 39.1
## 16 Everton 30 5 4.55 34.5
## 17 Nottingham Forest 26 4 3.06 29.1
## 18 Luton Town 25 4 2.94 27.9
## 19 Burnley 23 4 2.71 25.7
## 20 Sheffield United 16 5 2.42 18.4
The analysis conducted utilizes a simple prediction model to forecast the final points tally for teams in the league based on their performance to date. By calculating the average points per game (AvgPointsPerGame) and extrapolating this over the remaining matches (RemainingMatches), the model provides an estimate of the points each team might expect to gain (PredictedPoints) by the end of the season.
The teams are then ranked based on their final predicted points tally (FinalPoints), which is the sum of their current points (Points) and the additional points they are predicted to earn (PredictedPoints). The ordering of the teams takes into account their predicted final points, goal difference (GD), and goals for (GF) to break any ties.
From this model, Manchester City is projected to finish with the highest points at approximately 86.7, followed closely by Arsenal and Liverpool, both estimated to conclude the season around 85.2 points. It’s important to note the limitations of this approach, as it assumes that teams will continue to earn points at the same rate, which may not account for future fluctuations in form, injuries, or changes in competition level. Despite this, the model offers a straightforward method for estimating end-of-season standings based on current performance data.
# Plotting the results
ggplot(league_stats, aes(x = reorder(Team, -FinalPoints), y = FinalPoints, fill = Team)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Predicted Final Premier League Points", x = "Team", y = "Final Points") +
theme_minimal() +
theme(legend.position = "none")
The bar chart visualizes the predicted final points for the Premier League teams. The chart is arranged in ascending order.
Observations from the chart are:
Predicted Premier League 2023-24 Winner: Manchester City
In the context of the project, several ethical considerations are imperative to ensure responsible data usage, privacy, and fairness:
Data Privacy and Anonymization: While sports data is often public, it is crucial to handle individual player data with care, ensuring it remains anonymized in any dissemination. Detailed data about individual performance should not inadvertently reveal sensitive personal information that players might wish to keep private.
Fairness and Bias: Predictive models must be scrutinized for biases that may affect fairness. For instance, a model might undervalue a team due to historical data that does not reflect current capabilities or changes in team dynamics. Ensuring that the model is updated and reflects the current context is key to maintaining fairness.
Implications for Stakeholders: Analytical findings can significantly influence decisions related to player contracts, team management, and strategy. There is an ethical responsibility to present data and conclusions transparently to avoid misinterpretation that could harm the livelihoods of players or the reputation of teams.
Responsible Data Usage: All data should be used responsibly. Analysts should not manipulate or cherry-pick data to create narratives that could be misleading. Maintaining the integrity of the data and the methods used for analysis is paramount.
Public Perception and Communication: How findings are communicated to fans and the public can shape perceptions of teams and players. Care should be taken to avoid unduly negative or positive portrayals based on predictive models, which are inherently probabilistic and not certainties.
Consent and Rights: When using data that is not public, it’s important to ensure that the appropriate consents have been obtained. Players and teams have rights to their performance data, and their wishes and legal agreements regarding data usage must be respected.
Mitigating Negative Impact: Before implementing analytics-driven decisions, it’s crucial to consider and mitigate potential negative impacts on the well-being of players and teams, ensuring that the human element of sports is not overshadowed by numbers and statistics.
Throughout the project, it’s essential to adhere to ethical standards and best practices, reflecting on the broader implications of the work and striving for an approach that respects all stakeholders involved in the realm of sports analytics.