Analysis of Manchester United vs Tottenham Hotspur

August 19, 2023

Tottenham Hotspurs defeated Manchester United 2-0 this past week, and to understand how this happened, I’ve created a few visuals to show the bigger picture.

library(worldfootballR)
## Warning: package 'worldfootballR' was built under R version 4.1.3
library(ggplot2)
library(ggdark)
## Warning: package 'ggdark' was built under R version 4.1.3
library(viridis)
## Loading required package: viridisLite
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.1.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.1.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(RColorBrewer)
## Warning: package 'RColorBrewer' was built under R version 4.1.3
library(hrbrthemes)
## Warning: package 'hrbrthemes' was built under R version 4.1.3
## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
##       Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
##       if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
library(ggdark)

Loading Data

The first task it so load the data from the game. The data is provided by FBREF.

epl_2023_urls <- fb_match_urls(country = "ENG", gender = "M", season_end_year = 2024, tier = '1st')

match_url = "https://fbref.com/en/matches/4bb62251/Tottenham-Hotspur-Manchester-United-August-19-2023-Premier-League"

summary_match_report <- fb_advanced_match_stats(match_url = match_url,
                                        stat_type = 'summary', team_or_player = "player")

#summary_match_report <- summary_match_report[21:ncol(summary_match_report)]

passing_match_report <- fb_advanced_match_stats(match_url = match_url,
                                                stat_type = 'passing', team_or_player = "player")

#passing_match_report <- passing_match_report[21:ncol(passing_match_report)]

passing_types_match_report <- fb_advanced_match_stats(match_url = match_url,
                                                stat_type = 'passing_types', team_or_player = "player")

#passing_types_match_report <- passing_types_match_report[21:ncol(passing_types_match_report)]

defense_match_report <- fb_advanced_match_stats(match_url = match_url,
                                                      stat_type = 'defense', team_or_player = "player")

#defense_match_report <- defense_match_report[21:ncol(defense_match_report)]

possession_match_report <- fb_advanced_match_stats(match_url = match_url,
                                                stat_type = 'possession', team_or_player = "player")

#possession_match_report <- possession_match_report[21:ncol(possession_match_report)]

misc_match_report <- fb_advanced_match_stats(match_url = match_url,
                                                   stat_type = 'misc', team_or_player = "player")

#misc_match_report <- misc_match_report[21:ncol(misc_match_report)]

keeper_match_report <- fb_advanced_match_stats(match_url = match_url,
                                             stat_type = 'keeper', team_or_player = "player")

#keeper_match_report <- keeper_match_report[21:ncol(keeper_match_report)]

shooting_match_report <- fb_match_shooting(match_url = match_url, time_pause = 3)

shooting_match_report$Minute[c(16,17,36:39)] <- c(93,93,92,96,97,99)

shooting_match_report$Minute <- as.numeric(shooting_match_report$Minute)
shooting_match_report$xG <- as.numeric(shooting_match_report$xG)
shooting_match_report$PSxG[shooting_match_report$PSxG==""] <- 0
shooting_match_report$PSxG <- as.numeric(shooting_match_report$PSxG)

Analysis

My initial impressions of the game were that United created chances that they failed to capitalize from, especially in the first half.

I first investigated the shot creating actions, to show that Manchester United were just as threatening as Tottenham.

fig <- ggplot(summary_match_report, aes(x = reorder(Player,-SCA_SCA), y = SCA_SCA, fill = Team)) + geom_col(position = 'dodge') +
  scale_fill_manual(values = c('#132257','#DA291C'), name = 'Team', breaks = c('Tottenham Hotspur','Manchester United'),labels = c('Tottenham Hotspurs','Manchester United'))

fig <- fig + dark_theme_classic()
## Inverted geom defaults of fill and color/colour.
## To change them back, use invert_geom_defaults().
# Changes angle of x-axis text by 90 degrees
fig <- fig + theme(axis.text.x = element_text(angle = 75, vjust = 0.5, hjust = 1))

fig <- fig + ggtitle("Shot-Creating Action by Player") + xlab("Players") + ylab('Shot-Creating Actions')

ggplotly(fig)

Manchester United players were not unable to create shooting chances. In fact, they had just as many, if not more, shot-creating actions as Tottenham.

summary_match_report$SCA_per_Min <- summary_match_report$SCA_SCA / summary_match_report$Min

fig <- ggplot(summary_match_report, aes(x = reorder(Player,-SCA_per_Min), y = SCA_per_Min, fill = Team)) + geom_col(position = 'dodge') +
  scale_fill_manual(values = c('#132257','#DA291C'), name = 'Team', breaks = c('Tottenham Hotspur','Manchester United'),labels = c('Tottenham Hotspurs','Manchester United'))


fig <- fig + dark_theme_classic()

# Changes angle of x-axis text by 90 degrees
fig <- fig + theme(axis.text.x = element_text(angle = 75, vjust = 0.5, hjust = 1))

fig <- fig + ggtitle("Shot-Creating Action per Minute per Player") + xlab("Players") + ylab('Shot-Creating Actions')

ggplotly(fig)

Per minute played, Manchester United players performed at the same, if not higher levels vs Tottenham in terms of shot-creating actions.

Further proof of Manchester Uniteds attacking threat can be seen in the expected goals.

fig <- ggplot(summary_match_report[which(summary_match_report$xG_Expected>0),], aes(x = reorder(Player,-xG_Expected), y = xG_Expected, fill = Team)) + geom_col(position = 'dodge') +
  scale_fill_manual(values = c('#132257','#DA291C'), name = '', breaks = c('Tottenham Hotspur','Manchester United'),labels = c('Tottenham Hotspurs','Manchester United'))

fig <- fig + dark_theme_classic()

# Changes angle of x-axis text by 90 degrees
fig <- fig + theme(axis.text.x = element_text(angle = 75, vjust = 0.5, hjust = 1))

fig <- fig + ggtitle("Expected Goals per Player") + xlab("Players") + ylab('xG')

fig <- fig + annotate('text', x = 12, y = 0.725, size = 2.5, label = paste('Tottenham total xG: ',summary_match_report$Home_xG[1]))
fig <- fig + annotate('text', x = 12, y = 0.795, size = 2.5, label = paste('Manchester total xG: ',summary_match_report$Away_xG[1]))

ggplotly(fig)

Next I wanted to see where players are opperating on the pitch. Both offensively and defensively. Ill start by looking at passess.

manchester_passing_data <- passing_match_report[which(passing_match_report$Team == 'Manchester United'),]

fig <- ggplot(manchester_passing_data, aes(x = reorder(Player,-Att_Total), y = Att_Total, fill = Cmp_percent_Total)) +geom_bar(position = 'stack', stat = 'identity')

fig <- fig + ylim(min = 0, max = 80)

fig <- fig + scale_fill_gradient(low = 'red', high = 'green', name = 'Pass Completion %')

fig <- fig + dark_theme_classic()

# Changes angle of x-axis text by 90 degrees
fig <- fig + theme(axis.text.x = element_text(angle = 75, vjust = 0.5, hjust = 1))

fig1 <- fig + ggtitle("Manchester United Pass Compeletion") + xlab("Player") + ylab('Attempted Passes')


tottenham_passing_data <- passing_match_report[which(passing_match_report$Team == 'Tottenham Hotspur'),]

fig <- ggplot(tottenham_passing_data, aes(x = reorder(Player,-Att_Total), y = Att_Total, fill = Cmp_percent_Total)) +geom_bar(position = 'stack', stat = 'identity')

fig <- fig + ylim(min = 0, max = 80)

fig <- fig + scale_fill_gradient(low = 'red', high = 'green', name = 'Pass Completion %')

fig <- fig + dark_theme_classic()

# Changes angle of x-axis text by 90 degrees
fig <- fig + theme(axis.text.x = element_text(angle = 75, vjust = 0.5, hjust = 1))

fig2 <- fig + ggtitle("Tottenham Hotspur Pass Compeletion") + xlab("Player") + ylab('Attempted Passes')

fig <- subplot(fig1, fig2, titleY= T, titleX = T, margin = 0.1) %>% layout(title = 'Manchester United v Tottenham Pass Completion', theme(legend.title = 'Completion Percentage')) 
fig

Tottenham won the possession battle, mostly due to Manchester United’s rushed attacking style. Bruno Fernandes and Alejandro Garnacho were the biggest culprits in terms of wastefulness.

Lets look at every player, and compare their accuracy for each type of pass, short, medium, and long.

pass_data <- data.frame(
  team = passing_match_report$Team,
  player = passing_match_report$Player,
  S = passing_match_report$Cmp_Short,
  M = passing_match_report$Cmp_Medium,
  L = passing_match_report$Cmp_Long,
  short_accuracy = passing_match_report$Cmp_percent_Short,
  medium_accuracy = passing_match_report$Cmp_percent_Medium,
  long_accuracy = passing_match_report$Cmp_percent_Long,
  total_passes = passing_match_report$Cmp_Total,
  total_accuracy = passing_match_report$Cmp_percent_Total
)

pass_data_long <- pass_data %>%
  pivot_longer(
    cols = c("S", "M", "L"),
    names_to = "pass_type",
    values_to = "pass_count"
  ) %>%
  mutate(
    accuracy = case_when(
      pass_type == "S" ~ short_accuracy,
      pass_type == "M" ~ medium_accuracy,
      pass_type == "L" ~ long_accuracy
    )
  ) %>%
  select(team, player, pass_type, pass_count, accuracy)

fig <- ggplot(data = pass_data_long, aes(x = reorder(player,-pass_count)))


fig <- fig + geom_bar(aes(y = pass_count, fill = accuracy), color = 'black', stat = 'identity') +
  scale_fill_gradient2(low = 'black' ,high = 'green', mid = 'red',  midpoint = 50)

# Calculate cumulative pass counts for annotation positions
pass_data_long <- pass_data_long %>%
  group_by(player) %>%
  mutate(cumulative_pass_count = cumsum(pass_count))

# Filter out annotations for low pass counts
pass_data_filtered <- pass_data_long %>%
  filter(pass_count >= 1)  # Adjust the threshold as needed


# Add annotations indicating pass type with jitter and position adjustment
fig <- fig + geom_text(data = pass_data_filtered, aes(y = cumulative_pass_count, label = pass_type), color = 'white', vjust = 0, size = 3)


# fig <- fig + scale_fill_manual(data = pass_data_long$team, values = c('#132257','#DA291C'), breaks = c('Tottenham Hotspur','Manchester United'))

fig <- fig + dark_theme_classic()

# Changes angle of x-axis text by 90 degrees
fig <- fig + theme(axis.text.x = element_text(angle = 75, vjust = -0.5, hjust = 1))

fig <- fig + labs(title = "Number of Completed Passes: Short, Medium, and Long",
                  y = "Number of Passes Completed",
                  x = '',
                  fill = "Pass Accuracy") +
  theme(legend.position = "top")

ggplotly(fig)

Here, every pass is categorized as either short, medium, or long.

Bruno Fernandes sticks out as a player who no matter the type of pass, was inaccurate. However, Bruno is a creative player, and often goes for riskier passes in order to help his team. James Madison is a similar player, and his accuracy reflects a similar level.

Perhaps Manchester United were rewarded with their apparent risky passing. Lets look at their expected assisted goals (xAG), Passes into final third, and Progressive Passes

First lets look at xAG

pass_data <- data.frame(
  team = passing_match_report$Team,
  player = passing_match_report$Player,
  xAG = passing_match_report$xAG,
  final_third_passing = passing_match_report$Final_Third,
  progressive_passes = passing_match_report$PrgP,
  progressive_passes_distance = passing_match_report$PrgDist_Total
)

fig <- ggplot(data = pass_data, aes(x= reorder(player,-xAG)))

fig <- fig + geom_bar(aes(y = xAG, fill = team), stat = 'identity') +
  scale_fill_manual(values = c('#DA291C', '#132257'))



fig <- fig + labs(title = "Manchester United vs Tottenham Expected Assisted Goals",
                  y = 'Expected Assisted Goals',
                  x = 'Player',
                  fill = "Team") +
  dark_theme_classic() +
  theme(legend.position = "top")

fig <- fig + theme(axis.text.x = element_text(angle = 75, vjust = 0.5, hjust = 1))

ggplotly(fig)

Bruno’s level of risk in his passes works in favor of Manchester United. His expected assisted goals were one of the highest in the match, alongside Luke Shaw.

Next lets look at number of passes into the final third.

fig <- ggplot(data = pass_data, aes(x= reorder(player,-final_third_passing)))

fig <- fig + geom_bar(aes(y = final_third_passing, fill = team), stat = 'identity') +
  scale_fill_manual(values = c('#DA291C', '#132257'))



fig <- fig + labs(title = "Manchester United vs Tottenham Final Third Passing",
                  y = 'Number of Passes into Final Third',
                  x = 'Player',
                  fill = "Team") +
  dark_theme_classic() +
  theme(legend.position = "top")

fig <- fig + theme(axis.text.x = element_text(angle = 75, vjust = 0.5, hjust = 1))

ggplotly(fig)

Manchester United players struggled to progress the ball forward.

Next lets look at Number of passes into the final third.

fig <- ggplot(data = pass_data, aes(x= reorder(player,-progressive_passes)))

fig <- fig + geom_bar(aes(y = progressive_passes , fill = progressive_passes_distance), stat = 'identity') + 
  scale_fill_gradient(low = 'red', high = 'green')

fig <- fig + labs(title = "Manchester United vs Tottenham Final Third Passing",
                  y = 'Number of Progressive Passes',
                  x = 'Player',
                  fill = "Progressive Passing Distance") +
  dark_theme_classic() +
  theme(legend.position = "top")

fig <- fig + theme(axis.text.x = element_text(angle = 75, vjust = 0.5, hjust = 1))

ggplotly(fig)

Lets look at progessive passing distance relative to total passing distance.

pass_data <- data.frame(
  team = passing_match_report$Team,
  player = passing_match_report$Player,
  passing_distance = passing_match_report$TotDist_Total,
  xAG = passing_match_report$xAG,
  final_third_passing = passing_match_report$Final_Third,
  progressive_passes = passing_match_report$PrgP,
  progressive_passes_distance = passing_match_report$PrgDist_Total
)

fig <- ggplot(data = pass_data, aes(x= reorder(player,-passing_distance)))

fig <- fig + geom_bar(aes(y = passing_distance , fill = (progressive_passes_distance/passing_distance)), stat = 'identity') + 
  scale_fill_gradient(low = 'red', high = 'green')

fig <- fig + labs(title = "Manchester United vs Tottenham Ratio of Progressive Passing",
                  y = 'Total Passing Distance ',
                  x = 'Player',
                  fill = "Progressive Passing Distance/Total Passing Distance") +
  dark_theme_classic() +
  theme(legend.position = "top")

fig <- fig + theme(axis.text.x = element_text(angle = 75, vjust = 0.5, hjust = 1))

ggplotly(fig)

Lets now look at where both teams possessed the ball.

possession_data <- data.frame(
  team = possession_match_report$Team,
  player = possession_match_report$Player,
  touches = possession_match_report$Touches_Touches,
  Defensive_Third = possession_match_report$`Def 3rd_Touches`,
  Middle_Third = possession_match_report$`Mid 3rd_Touches`,
  Attacking_Third = possession_match_report$`Att 3rd_Touches`
)

possession_data_long <- possession_data %>%
  pivot_longer(
    cols = c("Defensive_Third", "Middle_Third", "Attacking_Third"),
    names_to = "position",
    values_to = "touch_count"
  ) %>% select(team, player, touches, position, touch_count)

possession_data_long$position <- factor(possession_data_long$position, levels = c('Attacking_Third','Middle_Third','Defensive_Third'))

fig <- ggplot(data = possession_data_long, aes(x = reorder(player, -touches)))


fig <- fig + geom_bar(aes(y = touch_count, fill = position), stat = 'identity') + 
  scale_fill_brewer(palette = 'RdYlGn', name = 'Position of Tackle')

fig <- fig + labs(title = "Manchester United vs Tottenham Touches",
                  y = 'Total Number of Touches ',
                  x = 'Player') +
  dark_theme_classic() +
  theme(legend.position = "top")

fig <- fig + theme(axis.text.x = element_text(angle = 75, vjust = 0.5, hjust = 1))

ggplotly(fig)

The ball was mostly controlled by defenders and midfielders since these are the players that dominate in the build up. Going from back to front was very difficult for Manchester United.

Defensively, lets see where each team initiated their tackles.

manchester_defense_data <- defense_match_report[which(passing_match_report$Team == 'Manchester United'),]

manchester_defense_data <- data.frame(
  team = manchester_defense_data$Team,
  player = manchester_defense_data$Player,
  tackles = manchester_defense_data$Tkl_Tackles,
  Defensive_Third = manchester_defense_data$`Def 3rd_Tackles`,
  Middle_Third = manchester_defense_data$`Mid 3rd_Tackles`,
  Attacking_Third = manchester_defense_data$`Att 3rd_Tackles`
)

manchester_defense_data_long <- manchester_defense_data %>%
  pivot_longer(
    cols = c("Defensive_Third", "Middle_Third", "Attacking_Third"),
    names_to = "position",
    values_to = "tackle_count"
  ) %>% select(team, player, tackles, position, tackle_count)

manchester_defense_data_long$position <- factor(manchester_defense_data_long$position, levels = c('Attacking_Third','Middle_Third','Defensive_Third'))

fig <- ggplot(manchester_defense_data_long, aes(x = reorder(player,-tackles), y = tackle_count, fill = position)) + 
  geom_bar(position = 'stack', stat = 'identity')

fig <- fig + scale_fill_brewer(palette = 'RdYlGn', name = 'Position of Tackle') +
  guides(fill = 'none')

fig <- fig + dark_theme_classic()

fig <- fig + ylim(min = 0, max = 6)

# Changes angle of x-axis text by 90 degrees
fig <- fig + theme(axis.text.x = element_text(angle = 75, vjust = 0.5, hjust = 1))

fig1 <- fig + ggtitle("Manchester United Tackles") + xlab("Player") + ylab('Number of Tackles')

ggplotly(fig1)
tottenham_defense_data <- defense_match_report[which(passing_match_report$Team == 'Tottenham Hotspur'),]

tottenham_defense_data <- data.frame(
  team = tottenham_defense_data$Team,
  player = tottenham_defense_data$Player,
  tackles = tottenham_defense_data$Tkl_Tackles,
  Defensive_Third = tottenham_defense_data$`Def 3rd_Tackles`,
  Middle_Third = tottenham_defense_data$`Mid 3rd_Tackles`,
  Attacking_Third = tottenham_defense_data$`Att 3rd_Tackles`
)

tottenham_defense_data_long <- tottenham_defense_data %>%
  pivot_longer(
    cols = c("Defensive_Third", "Middle_Third", "Attacking_Third"),
    names_to = "position",
    values_to = "tackle_count"
  ) %>% select(team, player, tackles, position, tackle_count)

tottenham_defense_data_long$position <- factor(tottenham_defense_data_long$position, levels = c('Attacking_Third','Middle_Third','Defensive_Third'))

fig <- ggplot(tottenham_defense_data_long, aes(x = reorder(player,-tackles), y = tackle_count, fill = position)) + 
  geom_bar(position = 'stack', stat = 'identity')

fig <- fig + scale_fill_brewer(palette = 'RdYlGn', name = 'Position of Tackle')

fig <- fig + ylim(min = 0, max = 6)

fig <- fig + dark_theme_classic()

# Changes angle of x-axis text by 90 degrees
fig <- fig + theme(axis.text.x = element_text(angle = 75, vjust = 0.5, hjust = 1))

fig2 <- fig + ggtitle("Tottenham Tackles") + xlab("Player") + ylab('Number of Tackles')

ggplotly(fig2)
fig <- subplot(fig1, fig2, titleY= T, titleX = T, margin = 0.1) %>% layout(title = 'Manchester United v Tottenham Tackle Position', theme(legend.title = 'Position of Tackles')) 
fig

Defensively, Manchester United did attempt to tackle higher up the pitch compared to Tottenham. Which would in theory allow them win the ball closer to goal, and thus score more goals.

Looking at the cumulative xG throughout the match, It is clear that Manchester United should have scored, and in fact they should have had 2 goals on average.

shooting_data <- data.frame(
  team = shooting_match_report$Squad,
  player = shooting_match_report$Player,
  minute = shooting_match_report$Minute,
  xG = shooting_match_report$xG
)

# Calculate cumulative xG for both teams
shooting_data <- shooting_data[order(shooting_data$minute), ]
shooting_data$cumulative_xG <- ave(shooting_data$xG, shooting_data$team, FUN = cumsum)

fig <- ggplot(data = shooting_data, aes(x = minute, y = cumulative_xG, color = team)) +
  geom_step(direction = 'hv') +
  scale_color_manual(values = c('Manchester Utd' = 'red', 'Tottenham' = 'blue')) + 
  dark_theme_dark() +
  ggtitle("Expected Goals Timeline: Manchester United vs Tottenham") + xlab("Minute") + ylab('Cumulative Expected Goals')


ggplotly(fig)

Looking at the cumulative xG throughout the match. It is clear that Manchester United should have scored, and in fact they should have had 2 goals on average.

Tottenham did a good job of possessing the ball and limiting the amount of time Manchester United had to create chances. Players like Cristian Romero, Yves Bissouma, and James Madison were the dominant player with the ball.

Manchester United need to improve their progression up the pitch. Manchester United midfielders and attackers need to control possession more often and be more selective when they play a risky pass. Additionally, they need a number 9 who will finish chances!