Join me as I explore the match data from the 2021/2022 English Premier League season and highlight interesting findings from the underlying data, this dataset is made available by Evan Gower and can be accessed here.
It contains all 380 games premier league games and the available match data from each game is listed below;
Date
Home Team
Away Team
Full-time Home and Away Goals
Halftime Home and Away Goals
Full-time and Halftime Results
Referee
Home and Away Shots(On and Off Target)
Home and Away Fouls, Corners and Bookings(Yellow and Red Cards)
library(dplyr)
library(tidyr)
library(forcats)
library(readr)
library(ggplot2)
library(janitor)
library(lubridate)
prem_matches <- read.csv("soccer21-22.csv")
tibble(prem_matches)
## # A tibble: 380 x 22
## Date HomeT~1 AwayT~2 FTHG FTAG FTR HTHG HTAG HTR Referee HS AS
## <chr> <chr> <chr> <int> <int> <chr> <int> <int> <chr> <chr> <int> <int>
## 1 13/0~ Brentf~ Arsenal 2 0 H 1 0 H M Oliv~ 8 22
## 2 14/0~ Man Un~ Leeds 5 1 H 1 0 H P Tier~ 16 10
## 3 14/0~ Burnley Bright~ 1 2 A 1 0 H D Coote 14 14
## 4 14/0~ Chelsea Crysta~ 3 0 H 2 0 H J Moss 13 4
## 5 14/0~ Everton Southa~ 3 1 H 0 1 A A Madl~ 14 6
## 6 14/0~ Leices~ Wolves 1 0 H 1 0 H C Paws~ 9 17
## 7 14/0~ Watford Aston ~ 3 2 H 2 0 H M Dean 13 11
## 8 14/0~ Norwich Liverp~ 0 3 A 0 1 A A Marr~ 14 19
## 9 15/0~ Newcas~ West H~ 2 4 A 2 1 H M Atki~ 17 8
## 10 15/0~ Totten~ Man Ci~ 1 0 H 0 0 D A Tayl~ 13 18
## # ... with 370 more rows, 10 more variables: HST <int>, AST <int>, HF <int>,
## # AF <int>, HC <int>, AC <int>, HY <int>, AY <int>, HR <int>, AR <int>, and
## # abbreviated variable names 1: HomeTeam, 2: AwayTeam
## # i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
colnames(prem_matches)
## [1] "Date" "HomeTeam" "AwayTeam" "FTHG" "FTAG" "FTR"
## [7] "HTHG" "HTAG" "HTR" "Referee" "HS" "AS"
## [13] "HST" "AST" "HF" "AF" "HC" "AC"
## [19] "HY" "AY" "HR" "AR"
prem_matches <- distinct(prem_matches)
prem_matches$Date <- dmy(prem_matches$Date)
prem_matches <- clean_names(prem_matches)
prem_matches_cleaned <- prem_matches %>%
select(date, home_team, away_team, ft_home_goals = fthg, ft_away_goals = ftag,
ft_result = ftr, ht_home_goals = hthg, ht_away_goals = htag, ht_results = htr,
referee, home_shots = hs, away_shots = as, home_shots_on_target = hst,
away_shots_on_target = ast, home_fouls = hf, away_fouls = af, home_corners = hc,
away_corners = ac, home_yellows = hy, away_yellows = ay, home_reds = hr, away_reds = ar)
tibble(prem_matches_cleaned)
## # A tibble: 380 x 22
## date home_team away_~1 ft_ho~2 ft_aw~3 ft_re~4 ht_ho~5 ht_aw~6 ht_re~7
## <date> <chr> <chr> <int> <int> <chr> <int> <int> <chr>
## 1 2021-08-13 Brentford Arsenal 2 0 H 1 0 H
## 2 2021-08-14 Man United Leeds 5 1 H 1 0 H
## 3 2021-08-14 Burnley Bright~ 1 2 A 1 0 H
## 4 2021-08-14 Chelsea Crysta~ 3 0 H 2 0 H
## 5 2021-08-14 Everton Southa~ 3 1 H 0 1 A
## 6 2021-08-14 Leicester Wolves 1 0 H 1 0 H
## 7 2021-08-14 Watford Aston ~ 3 2 H 2 0 H
## 8 2021-08-14 Norwich Liverp~ 0 3 A 0 1 A
## 9 2021-08-15 Newcastle West H~ 2 4 A 2 1 H
## 10 2021-08-15 Tottenham Man Ci~ 1 0 H 0 0 D
## # ... with 370 more rows, 13 more variables: referee <chr>, home_shots <int>,
## # away_shots <int>, home_shots_on_target <int>, away_shots_on_target <int>,
## # home_fouls <int>, away_fouls <int>, home_corners <int>, away_corners <int>,
## # home_yellows <int>, away_yellows <int>, home_reds <int>, away_reds <int>,
## # and abbreviated variable names 1: away_team, 2: ft_home_goals,
## # 3: ft_away_goals, 4: ft_result, 5: ht_home_goals, 6: ht_away_goals,
## # 7: ht_results
## # i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
prem_matches_combined <- prem_matches %>%
summarise(date, home_team, away_team, goals_scored = fthg + ftag, result = ftr,
referee, total_shots = hs + as, shots_on_target = hst + ast,
fouls_committed = hf + af, corner_count = hc + ac, bookings = hy+ay+hr+ar, yellow_cards = hy+ay, red_cards = hr+ar)
tibble(prem_matches_combined)
## # A tibble: 380 x 13
## date home_team away_t~1 goals~2 result referee total~3 shots~4 fouls~5
## <date> <chr> <chr> <int> <chr> <chr> <int> <int> <int>
## 1 2021-08-13 Brentford Arsenal 2 H M Oliv~ 30 7 20
## 2 2021-08-14 Man United Leeds 6 H P Tier~ 26 11 20
## 3 2021-08-14 Burnley Brighton 3 A D Coote 28 11 17
## 4 2021-08-14 Chelsea Crystal~ 3 H J Moss 17 7 26
## 5 2021-08-14 Everton Southam~ 4 H A Madl~ 20 9 28
## 6 2021-08-14 Leicester Wolves 1 H C Paws~ 26 8 16
## 7 2021-08-14 Watford Aston V~ 5 H M Dean 24 9 31
## 8 2021-08-14 Norwich Liverpo~ 3 A A Marr~ 33 11 18
## 9 2021-08-15 Newcastle West Ham 6 A M Atki~ 25 12 7
## 10 2021-08-15 Tottenham Man City 1 H A Tayl~ 31 7 19
## # ... with 370 more rows, 4 more variables: corner_count <int>, bookings <int>,
## # yellow_cards <int>, red_cards <int>, and abbreviated variable names
## # 1: away_team, 2: goals_scored, 3: total_shots, 4: shots_on_target,
## # 5: fouls_committed
## # i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
The Premier League consisted of the following 20 clubs during the 2021/22 season;
distinct(prem_matches,football_clubs = home_team) %>%
arrange(football_clubs)
## football_clubs
## 1 Arsenal
## 2 Aston Villa
## 3 Brentford
## 4 Brighton
## 5 Burnley
## 6 Chelsea
## 7 Crystal Palace
## 8 Everton
## 9 Leeds
## 10 Leicester
## 11 Liverpool
## 12 Man City
## 13 Man United
## 14 Newcastle
## 15 Norwich
## 16 Southampton
## 17 Tottenham
## 18 Watford
## 19 West Ham
## 20 Wolves
prem_matches_combined %>%
summarise(total_goals = sum(goals_scored),total_shots = sum(total_shots),
total_shots_on_target = sum(shots_on_target),
conversion_rate = total_goals/total_shots*100)
## total_goals total_shots total_shots_on_target conversion_rate
## 1 1071 9722 3352 11.01625
From the 380 fixtures played a total of 1,071 goals were scored from a total of 9722 shots at goal, that’s an average of 2.82 goals per game and a goal conversion rate of 11%. The average goal per game of 2.82 is the joint highest average goals per game in the history of the premier league with the 2018/19 season being the other.
ggplot(data = prem_matches_combined, aes(x = date, y = goals_scored)) +
geom_col(width = 4, fill = "#38003c") +
scale_x_date(date_labels="%b",date_breaks ="1 month") +
xlab("Month") + ylab("Goals Scored") + labs(title = "Goals by Month")
It is interesting to note that from the 10 months of football and 38
game weeks the most goals in a game week was recorded on the final day
of the season.
prem_matches_combined %>%
summarise(referees_used = sum(count(distinct(prem_matches, referee))),
total_fouls = sum(fouls_committed),
total_bookings = sum(bookings),
total_yellow_cards = sum(yellow_cards),
total_red_cards = sum(red_cards))
## referees_used total_fouls total_bookings total_yellow_cards total_red_cards
## 1 22 7681 1334 1291 43
There were 22 different referees officiating across the premier league season with a whooping 7,681 fouls awarded, the data does not indicate if this includes offside violations and handballs. From the fouls awarded, 1,334 resulted in bookings being handed out with majority being yellow cards, the dataset does not indicate if the red cards also include double yellow card infringements.
ggplot(data = prem_matches_combined, aes(x = referee)) +
geom_bar(fill = "#38003c", color = "#38003c") +
theme(axis.text.x = element_text(angle = 90)) +
xlab("Referee") + ylab("Games Officiated") +
labs(title = "Games Officiated by Referee")
Anthony Taylor, Paul Tierney and Craig Pawson officiated the most games during the 2021/22 season.
ggplot(data = prem_matches_combined, aes(x = referee, y = bookings, fill = bookings)) +
geom_col(fill = "#38003c", color = "#38003c") +
theme(axis.text.x = element_text(angle = 90)) +
xlab("Referee") + ylab("Cards Issued") +
labs(title = "Bookings Issued by Referees")
As expected, the referees with the most matches officiated also issued out the most bookings during throughout the season.
ggplot(data = prem_matches_combined, aes(x = referee, y = red_cards, fill = bookings)) +
geom_col(fill = "red", color = "#38003c") +
theme(axis.text.x = element_text(angle = 90)) +
xlab("Referee") + ylab("Card Count") +
labs(title = "Red Cards Issued by Referees")
Despite not dishing out the most bookings, John Moss and Michael Oliver handed out the most red cards.
wins <- prem_matches_cleaned %>%
mutate(ft_result = case_when(
ft_result == "H" ~ home_team,
ft_result == "A" ~ away_team,
ft_result == "D" ~ "Draw"))
wins <- wins %>% filter(ft_result != "Draw")
ggplot(data = wins, aes(x = ft_result)) +
geom_bar(fill = "#00ff85", color = "#38003c") +
theme(axis.text.x = element_text(angle = 90)) +
xlab("Teams") + ylab("Wins") + labs(title = "Total Games won by each Football team")
Manchester City and Liverpool managed to win the most games with both teams having over 25 wins, with the closest teams being Arsenal, Chelsea and Tottenham.While Burnley, Leeds, Norwich, Southampton and Watford being on the opposite end with neither Team being able to muster up 10 wins throughout the season.
losses <- prem_matches_cleaned %>%
mutate(ft_result = case_when(
ft_result == "H" ~ away_team,
ft_result == "A" ~ home_team,
ft_result == "D" ~ "Draw"))
losses <- losses %>% filter(ft_result != "Draw")
ggplot(data = losses, aes(x = ft_result)) +
geom_bar(fill = "#e90052", color = "#38003c") +
theme(axis.text.x = element_text(angle = 90)) +
xlab("Teams") + ylab("Losses") + labs(title = "Total Games lost by each Football Team")
When you do not win games, you are likely to lose them as Norwich and Watford recording over 25 losses with Everton in close third with 21 games lost. Whereas Manchester City and Liverpool both managed to avoid losing 5 games which is very impressive.
draws <- prem_matches_cleaned %>%
select(home_team,away_team,ft_result) %>%
mutate(ft_result = case_when(
ft_result == "H" ~ home_team,
ft_result == "A" ~ away_team,
ft_result == "D" ~ "Draw"))
draws <- draws %>% filter(ft_result == "Draw")
draws <- cbind(draws[3], stack(draws[1:2]))
ggplot(data = draws, aes(x = values)) +
geom_bar(fill = "#ffffff", color = "#38003c") +
theme(axis.text.x = element_text(angle = 90)) +
xlab("Teams") + ylab("Draws") + labs(title = "Total Games Drawn by each Football Team")
The 2021/22 Premier League also recorded 88 drawn games across the 380 fixtures recorded, meaning 23% of the games played ended in a draw. Brighton and Crystal Palace are the joint teams with the most draws with both teams managing 15 draws each followed by Burnley and Southampton having 13 and 11 draws respectively.
Thank You!!! for taking your time to go through my analysis. Any feedback or advice is welcomed.