Abstract

The 2018 FIFA World Cup highlighted exceptional midfield performances, with Luka Modrić earning the Golden Ball as the tournament’s most valuable player. This study conducts a statistical analysis of Modrić’s contributions in comparison to N’Golo Kanté and Kevin De Bruyne, focusing on passing efficiency, defensive actions, and goal-scoring involvement. Using open-source, location-based datasets, this research applies spatial analytics and data visualization techniques to assess key performance indicators, including pass completion rates, progressive passes, key passes, tackles, interceptions, ball recoveries, and goal-creating actions. Methodologies such as heat maps, passing networks, spatial scatter plots, and spatial autocorrelation are employed to analyze player movement, positioning, and distribution patterns. Data processing is conducted using R Studio and spatial analytics tools to provide an objective evaluation of midfield effectiveness. The results aim to demonstrate the impact of advanced spatial statistics in player assessment, offering a data-driven approach to evaluating midfield contributions beyond conventional metrics.

Introduction

The 2018 FIFA World Cup saw Croatia’s Luka Modrić awarded the Golden Ball as the tournament’s most valuable player, recognizing his overall influence on Croatia’s first-ever World Cup final appearance. While his performances were widely acknowledged as critical to the team’s success, traditional statistical metrics—such as goals and assists—have led to debates about whether Modrić was the most statistically dominant player of the tournament. For instance, France’s Antoine Griezmann registered more direct goal contributions, leading some to question Modrić’s selection. Soccer analysis has historically relied on fundamental statistics such as possession percentage, total passes, and shots on goal to evaluate player performance. However, these metrics do not capture deeper tactical insights, particularly in midfield roles where impact extends beyond direct goal involvement. The growing accessibility of open-source, location-based datasets has enabled the application of spatial analytics in soccer analysis, providing a more detailed understanding of player influence through advanced visualization techniques. This study employs spatial analytics and data visualization to assess Modrić’s performances in comparison to two other key midfielders from the 2018 World Cup: Belgium’s Kevin De Bruyne and France’s N’Golo Kanté. Using statistical computing tools such as R Studio, this analysis will focus on unconventional metrics derived from spatial data, including passing distribution, defensive actions, and goal-creating sequences. Methods such as passing networks, heat maps, and spatial autocorrelation will be used to highlight movement patterns and overall impact. By applying these advanced techniques, this study aims to provide an objective assessment of midfield effectiveness and evaluate Modrić’s statistical case for the Golden Ball beyond conventional goal-scoring metrics.

Data

Soccer analytics and location-based datasets are gaining interest in both the proprietary and open-source worlds. As mentioned in the paper titled, “A public data set of spatiotemporal match events in soccer competitions”, several large open collections of soccer-logs are available to the public. These datasets contain spatio-temporal events that occurred during each match of prominent soccer competitions, including the world cup (Pappalardo, Cinita, Rossi, et al.). Soccer-logs describe match events, each containing information about its type (pass, shot, foul, tackle, etc.), a time-stamp, the player(s), the position on the field and additional information (e.g., pass accuracy) (Pappalardo, Cinita, Rossi, et al.). This data can be plotted in software such as R Studio for visualization.

Open-source data can be found here: https://figshare.com/collections/Soccer_match_event_dataset/4415000/2

Original data source, including open source and proprietary data, can be found here: https://wyscout.com/

Methodology

Challenges/Limitations

The spatial data accessible is presented in a csv format, with each action in a soccer match noted as a record in the dataset, containing fields such as a time-stamp, start x, start y, end x, end y, and the players involved. This data needed processed in R using tidyverse to filter by player, team, or specific match. Furthermore, the data can be processed to create specific columns as shown here:

Figure 1 Data cleanup format necessary to plot x/y data for analytical purposes in R Studio.
Figure 1 Data cleanup format necessary to plot x/y data for analytical purposes in R Studio.

Additionally, the sheer size of the dataset was an initial challenge. Original file contained hundreds of thousands of data records, each with distinct actions performed across the 64 matches played in the 2018 World Cup. Processing the data via R to create subset sheets with only actions by various teams and players was necessary prior to completing the visual analysis of the data. This allowed for easier processing and improved performance within R Studio.

Analysis & Methods

After processing of the data, the ggplot packages, including ggsoccer and ggforce, can be used to map the actions on a visual soccer pitch, using the following coordinates to ensure the data will be consistent throughout all soccer games.

The spatial distribution of coordinates that makeup the soccer pitch outline that allow the x/y data to be plotted on a soccer pitch. The event’s coordinates depends on the subject. The subject’s goal to be defended is always x=0% and the attack is always x=100%. All values are % expressed as (x,y). https://dataglossary.wyscout.com/pitch_coordinates/
The spatial distribution of coordinates that makeup the soccer pitch outline that allow the x/y data to be plotted on a soccer pitch. The event’s coordinates depends on the subject. The subject’s goal to be defended is always x=0% and the attack is always x=100%. All values are % expressed as (x,y). https://dataglossary.wyscout.com/pitch_coordinates/

First, pitch dimensions are defined to create a pitch with necessary visual context cues, including the penalty box, the six-yard box, and the goal.

pitch_wyscout <- list(
  length = 100,
  width = 100,
  penalty_box_length = 16,
  penalty_box_width = 62,
  six_yard_box_length = 6,
  six_yard_box_width = 26,
  penalty_spot_distance = 10,
  goal_width = 12,
  origin_x = 0,
  origin_y = 0
)

Then, a pitch is created via a function using ggplot package to create a basic pitch within R Studio using Wyscout dimensions, where the source data comes from. This will allow us to plot the various actions that happen within the match using the various geometries, such as the pitch outline rectangle shown below, as a visual reference to aid in visual understanding.

create_wyscout_pitch <- function() {
  ggplot() +
    # Full pitch outline
    geom_rect(aes(xmin = pitch_wyscout$origin_x, xmax = pitch_wyscout$length,
                  ymin = pitch_wyscout$origin_y, ymax = pitch_wyscout$width), 
              color = "grey35", fill = NA) }

Metrics Used

General plotting will be used to highlight surface level statistics via a spatial lens. This includes plotting actions such as passes and defensive tackles in relation to a soccer pitch. The following plots are implemented for various categories of soccer statistics:

Part 1: Initial Comparison

Radar Plot highlighting basic metrics analyzed later in the report

Part 2: Passing

Pass Network highlighting start/end locations of passes in a game

**Spatial Autocorrelation to analyze the spatial distribution of starting passing locations in a match (see detailed description below).

Part 3: Defensive Actions

Heatmap of aggregated defensive actions within a specified grid zone across the soccer pitch

Part 4: Goal Contributing Actions

Network of shots and passes that lead to shot creating opportunities opportunities, defined as a “clear goal scoring chance” (Wyscout)

Network of shots and passes that ultimately lead to goals (a subset of previous network plot)

** Spatial autocorrelation will be performed to analyze metrics that go beyond a simple visual plot. In particular, Moran’s I statistic will be used to analyze the starting pass locations for each of the top midfielders in the 2018 World Cup.

Libraries Used

library(readr)
library(dplyr)
library(gridExtra)
library(cowplot)
library(ggplot2)
library(gridExtra)
library(kableExtra)
library(magick)
library(tidyverse)
library(here)
library(janitor)
library(ggradar)
library(fmsb)
library(ggforce)
library(ggsoccer)
library(scales)
library(knitr)
library(sf)
library(spdep)

Results

Part 1: Initial Comparison of Top Midfielders

This project aims to analyze three of the top midfielders at the 2018 World Cup across specific metrics highlighting their passing, defensive, and goal scoring actions.

The following six players were selected due to their standout performances leading their respective nations to a first (France), second (Croatia), and third (Belgium) place, respectively.

three players analyzed based on 2018 World Cup Performances across 7 matches.
three players analyzed based on 2018 World Cup Performances across 7 matches.

1.1 Metrics used for analysis

The six players were analyzed based on the following metrics, which will be discussed further later.

Completed Passes (Cmp) Progressive Passes (PrgP)

Shot-Creating Actions (SCA) Goal-Creating Actions (GCA)

Tackles (Tkl) Interceptions (Int)

1.2 Z Scores

Z-scores are used in soccer statistics to standardize different metrics, making them comparable across players. Since stats like passes, tackles, and goal contributions exist on different scales, Z-scores show how much a player’s performance deviates from the average. This helps compare players with different roles, identify statistical outliers, and account for differences in playing time.

A Z-score measures how far a player’s statistic is from the average, in terms of standard deviations. A Z-score of 0 means the player is exactly average, while a positive Z-score indicates above-average performance, and a negative Z-score indicates below-average performance. For example, if a player has a Z-score of +2.0 in key passes, it means he is two standard deviations above the average midfielder, making him an elite passer. Conversely, a Z-score of -1.5 in tackles would mean he performs below the average in that category.

# Load the Passing Data
file1 <- "All_MF_Pass.csv"
passdata <- read_csv(file1)

# Clean and compute Z-scores for Passing Data
passdata <- passdata %>%
  filter(!is.na(PrgP) & !is.na(Cmp_1)) %>%  # Remove NA values
  mutate(
    PrgP = as.numeric(PrgP),
    Cmp_1 = as.numeric(Cmp_1),
    PrgP_Z = (PrgP - mean(PrgP, na.rm = TRUE)) / sd(PrgP, na.rm = TRUE),
    Cmp_Z = (Cmp_1 - mean(Cmp_1, na.rm = TRUE)) / sd(Cmp_1, na.rm = TRUE)
  ) %>%
  select(Player, PrgP, PrgP_Z, Cmp_1, Cmp_Z)

# Load the Shooting Data
file2 <- "All_MF_Shoot.csv"
shootdata <- read_csv(file2)

# Clean and compute Z-scores for Shooting Data
shootdata <- shootdata %>%
  filter(!is.na(GCA) & !is.na(SCA)) %>%  # Remove NA values
  mutate(
    GCA = as.numeric(GCA),
    SCA = as.numeric(SCA),
    GCA_Z = (GCA - mean(GCA, na.rm = TRUE)) / sd(GCA, na.rm = TRUE),
    SCA_Z = (SCA - mean(SCA, na.rm = TRUE)) / sd(SCA, na.rm = TRUE)
  ) %>%
  select(Player, GCA, GCA_Z, SCA, SCA_Z)

# Load the Defending Data
file3 <- "All_MF_Def.csv"
defdata <- read_csv(file3)

# Clean and compute Z-scores for Defending Data
defdata <- defdata %>%
  filter(!is.na(Tkl_1) & !is.na(Int)) %>%  # Remove NA values
  mutate(
    Tkl_1 = as.numeric(Tkl_1),
    Int = as.numeric(Int),
    Tkl_1_Z = (Tkl_1 - mean(Tkl_1, na.rm = TRUE)) / sd(Tkl_1, na.rm = TRUE),
    Int_Z = (Int - mean(Int, na.rm = TRUE)) / sd(Int, na.rm = TRUE)
  ) %>%
  select(Player, Tkl_1, Tkl_1_Z, Int, Int_Z)

# Merge all datasets on Player
combined_data <- passdata %>%
  full_join(shootdata, by = "Player") %>%
  full_join(defdata, by = "Player")

# Filter for specific players
player_values <- c("Luka Modric", "Kevin De Bruyne", "N'Golo Kante")
final_table <- combined_data %>% filter(Player %in% player_values)

# Select only columns with "Z" in their name (Z-scores)
final_table_z <- final_table %>%
  select(Player, contains("Z"))

# Identify numeric Z-score columns (excluding Player)
z_columns <- names(final_table_z)[names(final_table_z) != "Player"]

# Create Max row (set all Z-score columns to 2)
max_row <- final_table_z[1, ]  # Copy structure
max_row[z_columns] <- 2  # Set only Z-score columns to 2
max_row$Player <- "Max"  # Set "Max" as the Player name

# Create Min row (set all Z-score columns to -2)
min_row <- final_table_z[1, ]  # Copy structure
min_row[z_columns] <- -2  # Set only Z-score columns to -2
min_row$Player <- "Min"  # Set "Min" as the Player name

# Bind the Max and Min rows to the final table
finaltable_z <- bind_rows(max_row, min_row, final_table_z)

# Print the final merged table
print(finaltable_z)
## # A tibble: 5 × 7
##   Player          PrgP_Z  Cmp_Z   GCA_Z  SCA_Z Tkl_1_Z   Int_Z
##   <chr>            <dbl>  <dbl>   <dbl>  <dbl>   <dbl>   <dbl>
## 1 Max              2      2      2       2       2      2     
## 2 Min             -2     -2     -2      -2      -2     -2     
## 3 Luka Modric      1.49   0.900  0.0984  1.24   -0.275  0.119 
## 4 Kevin De Bruyne  0.964  0.422  1.45    1.33   -0.139 -0.0389
## 5 N'Golo Kante     0.103  0.498 -0.519  -0.700   0.201  1.85

1.3 Radar Plot

# Load the dataset
data <- read.csv("CombinedData_Z_final.csv", row.names = 1)

# Extract player-specific data along with Max and Min for normalization
modric_data <- data[c("Max", "Min", "Luka Modric"), ]
kdb_data <- data[c("Max", "Min", "Kevin De Bruyne"), ]
griezmann_data <- data[c("Max", "Min", "N'Golo Kante"), ]

# Define colors and titles
colors <- c("#171796", "#FFCD00", "#E1000F")
titles <- c("Luka Modric", "Kevin De Bruyne", "N'Golo Kante")

# Reduce plot margins
par(mar = c(1, 1, 1, 1))
par(mfrow = c(1, 3)) # Split plotting area into 3

create_beautiful_radarchart <- function(data, color = "#00AFBB", 
                                        vlabels = colnames(data), vlcex = 0.7,
                                        caxislabels = NULL, title = NULL, ...){
  radarchart(
    data, axistype = 1,
    # Customize the polygon
    pcol = color, pfcol = scales::alpha(color, 0.5), plwd = 2, plty = 1,
    # Customize the grid
    cglcol = "grey", cglty = 1, cglwd = 0.8,
    # Customize the axis
    axislabcol = "black", 
    # Variable labels
    vlcex = vlcex, vlabels = vlabels,
    caxislabels = caxislabels, title = title, ...
  )
}

# Create radar charts
create_beautiful_radarchart(modric_data, 
                            caxislabels = c(-2, -1, 0, 1, 2), 
                            color = colors[1], 
                            title = titles[1])

create_beautiful_radarchart(kdb_data, 
                            caxislabels = c(-2, -1, 0, 1, 2), 
                            color = colors[2], 
                            title = titles[2])

create_beautiful_radarchart(griezmann_data, 
                            caxislabels = c(-2, -1, 0, 1, 2), 
                            color = colors[3], 
                            title = titles[3])

# Reset plot settings
par(mfrow = c(1, 1))

Function reference: https://www.datanovia.com/en/blog/beautiful-radar-chart-in-r-using-fmsb-and-ggplot-packages/

Code Description Table:

# Define table
notation_table <- data.frame(
  Statistic = c("GCA_Z", "SCA_Z", "Tkl_1_Z", "Int_Z", "Cmp_Z", "PrgP_Z"),
  Z_Score_Description = c("Goal Creating Actions", 
                  "Shot Creating Actions", 
                  "Tackles", 
                  "Interceptions", 
                  "Completed Passes", 
                  "Progressive Passes")
)

# Remove "Player" row
descriptions <- notation_table[-1, ]  # Removes the first row
descr <- tableGrob(print(descriptions))
##   Statistic   Z_Score_Description
## 2     SCA_Z Shot Creating Actions
## 3   Tkl_1_Z               Tackles
## 4     Int_Z         Interceptions
## 5     Cmp_Z      Completed Passes
## 6    PrgP_Z    Progressive Passes

Part 2: Passing

A great midfielder in soccer is often judged by their passing ability. The best ones make successful passes consistently, meaning they rarely lose the ball. They also make progressive passes, which move the ball forward toward the opponent’s goal, helping create attacks. Lastly, they have positional awareness, knowing when and where to pass based on the movement of teammates and opponents. A midfielder who excels in these areas keeps their team in control and creates goal-scoring opportunities.

2.1 Passes Completed Comparison

Completed passes are important because they help a team keep control of the ball and build attacks. When passes are accurate, the team can move forward, create scoring chances, and keep the game under control. Bad passes give the ball to the other team, making it harder to win.

# Read the dataset
croatia_data <- read.csv("CroatiaTeamData.csv")

# Filter data to include player_id 8287 and events with id 1801
croatia_data_filter <- croatia_data %>%
  filter(player_id == 8287 & event_name == "Pass" & grepl("1801", tags))

# Calculate total passes
total_passes <- nrow(croatia_data_filter)

# Get total playtime (sum of max event time in each match for player_id 8287)
total_playtime_seconds <- croatia_data %>%
  filter(player_id == 8287) %>%
  group_by(match_id) %>%
  summarise(max_event_sec = max(event_sec, na.rm = TRUE)) %>%
  summarise(total_playtime = sum(max_event_sec, na.rm = TRUE)) %>%
  pull(total_playtime)

# Convert playtime to minutes
total_playtime_minutes <- total_playtime_seconds / 60

# Calculate passes per 90 minutes
passes_per_90 <- round((total_passes / total_playtime_minutes) * 90)

# Define pitch dimensions
pitch_wyscout <- list(
  length = 100,
  width = 100,
  penalty_box_length = 16,
  penalty_box_width = 62,
  six_yard_box_length = 6,
  six_yard_box_width = 26,
  penalty_spot_distance = 10,
  goal_width = 12,
  origin_x = 0,
  origin_y = 0
)

create_wyscout_pitch <- function() {
  ggplot() +
    # Full pitch outline
    geom_rect(aes(xmin = pitch_wyscout$origin_x, xmax = pitch_wyscout$length,
                  ymin = pitch_wyscout$origin_y, ymax = pitch_wyscout$width), 
              color = "grey35", fill = NA) +
    
    # Penalty box (left)
    geom_rect(aes(xmin = pitch_wyscout$origin_x, 
                  xmax = pitch_wyscout$penalty_box_length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$penalty_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$penalty_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Penalty box (right)
    geom_rect(aes(xmin = pitch_wyscout$length - pitch_wyscout$penalty_box_length,
                  xmax = pitch_wyscout$length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$penalty_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$penalty_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Six-yard box (left)
    geom_rect(aes(xmin = pitch_wyscout$origin_x, 
                  xmax = pitch_wyscout$six_yard_box_length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$six_yard_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$six_yard_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Six-yard box (right)
    geom_rect(aes(xmin = pitch_wyscout$length - pitch_wyscout$six_yard_box_length,
                  xmax = pitch_wyscout$length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$six_yard_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$six_yard_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Center circle
    geom_point(aes(x = pitch_wyscout$length / 2, y = pitch_wyscout$width / 2), 
               size = 0.5, color = "grey35") +
    geom_circle(aes(x0 = pitch_wyscout$length / 2, y0 = pitch_wyscout$width / 2, r = 9.15),
                color = "grey35") +
    
    # Goals
    geom_segment(aes(x = pitch_wyscout$origin_x, 
                     xend = pitch_wyscout$origin_x, 
                     y = (pitch_wyscout$width - pitch_wyscout$goal_width) / 2, 
                     yend = (pitch_wyscout$width + pitch_wyscout$goal_width) / 2),
                 color = "grey35", size = 3) +
    geom_segment(aes(x = pitch_wyscout$length, 
                     xend = pitch_wyscout$length, 
                     y = (pitch_wyscout$width - pitch_wyscout$goal_width) / 2, 
                     yend = (pitch_wyscout$width + pitch_wyscout$goal_width) / 2),
                 color = "grey35", size = 3) +
    
    # Center line
    geom_segment(aes(x = pitch_wyscout$length / 2, xend = pitch_wyscout$length / 2,
                     y = pitch_wyscout$origin_y, yend = pitch_wyscout$width),
                 color = "grey35") +
    
     # Attack direction annotation
    annotate("segment", x = 45, xend = 80, y = 110, yend = 110, 
           arrow = arrow(length = unit(0.3, "inches")), color = "black", size = 1) +
    annotate("text", x = 60, y = 120, label = "Attacking Direction", size = 5, fontface = "bold") +
    
    # Adjust theme
    theme(
      panel.grid = element_blank(),
      panel.background = element_rect(fill = "white", color = NA),
      plot.background = element_rect(fill = "white", color = NA),
      axis.text = element_blank(),
      axis.ticks = element_blank(),
      axis.title = element_blank()
    ) +
    coord_fixed() +
    labs(title = "Football Pitch")
}

# Generate the subtitle with the record count
subtitle_text <- paste("Total successful passes normalized per 90 minutes:", passes_per_90)

# Plot the passes with transparency and dark green color
create_wyscout_pitch() +
  geom_segment(data = croatia_data_filter, 
               aes(x = start_x, y = start_y, 
                   xend = end_x, yend = end_y), 
               lineend = "round", 
               linewidth = 0.6,  # Use linewidth for ggplot2 3.4.0+
               arrow = arrow(length = unit(0.08, "inches")),
               color = "#171796",  # Set color to dark green
               alpha = 0.5) +  # Adjust transparency (0 = fully transparent, 1 = fully opaque)
  labs(title = "Luka Modric Successful Passes in 2018 World Cup",
       subtitle = subtitle_text) + 
  coord_fixed(ratio = 60 / 100)

# Read the dataset
belgium_data <- read.csv("BelgiumTeamData.csv")

# Filter data to only player_id 38021 (Kevin De Bruyne)
belgium_data_filter <- belgium_data %>%
  filter(player_id == 38021 & event_name == "Pass" & grepl("1801", tags))

# Calculate total passes
total_passes <- nrow(belgium_data_filter)

# Get total playtime (sum of max event time in each match for player_id 70123)
total_playtime_seconds <- belgium_data %>%
  filter(player_id == 38021) %>%
  group_by(match_id) %>%
  summarise(max_event_sec = max(event_sec, na.rm = TRUE)) %>%
  summarise(total_playtime = sum(max_event_sec, na.rm = TRUE)) %>%
  pull(total_playtime)

# Convert playtime to minutes
total_playtime_minutes <- total_playtime_seconds / 60

# Calculate passes per 90 minutes
passes_per_90 <- round((total_passes / total_playtime_minutes) * 90)

# Define pitch dimensions
pitch_wyscout <- list(
  length = 100,
  width = 100,
  penalty_box_length = 16,
  penalty_box_width = 62,
  six_yard_box_length = 6,
  six_yard_box_width = 26,
  penalty_spot_distance = 10,
  goal_width = 12,
  origin_x = 0,
  origin_y = 0
)

create_wyscout_pitch <- function() {
  ggplot() +
    # Full pitch outline
    geom_rect(aes(xmin = pitch_wyscout$origin_x, xmax = pitch_wyscout$length,
                  ymin = pitch_wyscout$origin_y, ymax = pitch_wyscout$width), 
              color = "grey35", fill = NA) +
    
    # Penalty box (left)
    geom_rect(aes(xmin = pitch_wyscout$origin_x, 
                  xmax = pitch_wyscout$penalty_box_length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$penalty_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$penalty_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Penalty box (right)
    geom_rect(aes(xmin = pitch_wyscout$length - pitch_wyscout$penalty_box_length,
                  xmax = pitch_wyscout$length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$penalty_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$penalty_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Six-yard box (left)
    geom_rect(aes(xmin = pitch_wyscout$origin_x, 
                  xmax = pitch_wyscout$six_yard_box_length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$six_yard_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$six_yard_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Six-yard box (right)
    geom_rect(aes(xmin = pitch_wyscout$length - pitch_wyscout$six_yard_box_length,
                  xmax = pitch_wyscout$length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$six_yard_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$six_yard_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Center circle
    geom_point(aes(x = pitch_wyscout$length / 2, y = pitch_wyscout$width / 2), 
               size = 0.5, color = "grey35") +
    geom_circle(aes(x0 = pitch_wyscout$length / 2, y0 = pitch_wyscout$width / 2, r = 9.15),
                color = "grey35") +
    
    # Goals
    geom_segment(aes(x = pitch_wyscout$origin_x, 
                     xend = pitch_wyscout$origin_x, 
                     y = (pitch_wyscout$width - pitch_wyscout$goal_width) / 2, 
                     yend = (pitch_wyscout$width + pitch_wyscout$goal_width) / 2),
                 color = "grey35", size = 3) +
    geom_segment(aes(x = pitch_wyscout$length, 
                     xend = pitch_wyscout$length, 
                     y = (pitch_wyscout$width - pitch_wyscout$goal_width) / 2, 
                     yend = (pitch_wyscout$width + pitch_wyscout$goal_width) / 2),
                 color = "grey35", size = 3) +
    
    # Center line
    geom_segment(aes(x = pitch_wyscout$length / 2, xend = pitch_wyscout$length / 2,
                     y = pitch_wyscout$origin_y, yend = pitch_wyscout$width),
                 color = "grey35") +
    
    # Attack direction annotation
    annotate("segment", x = 45, xend = 80, y = 110, yend = 110, 
           arrow = arrow(length = unit(0.3, "inches")), color = "black", size = 1) +
    annotate("text", x = 60, y = 120, label = "Attacking Direction", size = 5, fontface = "bold") +
    
    
    # Adjust theme
    theme(
      panel.grid = element_blank(),
      panel.background = element_rect(fill = "white", color = NA),
      plot.background = element_rect(fill = "white", color = NA),
      axis.text = element_blank(),
      axis.ticks = element_blank(),
      axis.title = element_blank()
    ) +
    coord_fixed() +
    labs(title = "Football Pitch")
}

# Generate the subtitle with the record count
subtitle_text <- paste("Total successful passes normalized per 90 minutes:", passes_per_90)

# Plot the passes with transparency and dark green color
create_wyscout_pitch() +
  geom_segment(data = belgium_data_filter, 
               aes(x = start_x, y = start_y, 
                   xend = end_x, yend = end_y), 
               lineend = "round", 
               linewidth = 0.6,  # Use linewidth for ggplot2 3.4.0+
               arrow = arrow(length = unit(0.08, "inches")),
               color = "#FFCD00",  # Set color to dark green
               alpha = 0.5) +  # Adjust transparency (0 = fully transparent, 1 = fully opaque)
  labs(title = "Kevin De Bruyne Successful Passes in 2018 World Cup",
       subtitle = subtitle_text) + 
  coord_fixed(ratio = 60 / 100)

From these results, we can see that Modric has a higher total number of successful passes per 90 minutes, with 108 compared to Witsel’s 95. In addition, the passing distribution covers more of the pitch, especially closer towards the opponent’s goal.

2.2 Progressive Passing

Progressive Pass Explained

A progressive pass is a pass that moves the ball forward towards the oppositions goal. In the example below, Luka Modric passes the ball to his teammate in the opponent’s 18 yard box.

Luka Modric progressive pass example.
Luka Modric progressive pass example.

Let’s compare Luka Modric’s progressive passes in the 2018 World Cup to Axel Witsel’s.

# Read the dataset
croatia_data <- read.csv("CroatiaTeamData.csv")

# Filter data to include player_id 8287 and events with id 1801
croatia_data_filter <- croatia_data %>%
  filter(player_id == 8287 & event_name == "Pass" & grepl("1801", tags) & (abs(end_x - start_x) > 28 | (end_x > 75 & end_y > 20 & end_y < 80)))

# Calculate total passes
total_passes <- nrow(croatia_data_filter)

# Get total playtime (sum of max event time in each match for player_id 8287)
total_playtime_seconds <- croatia_data %>%
  filter(player_id == 8287) %>%
  group_by(match_id) %>%
  summarise(max_event_sec = max(event_sec, na.rm = TRUE)) %>%
  summarise(total_playtime = sum(max_event_sec, na.rm = TRUE)) %>%
  pull(total_playtime)

# Convert playtime to minutes
total_playtime_minutes <- total_playtime_seconds / 60

# Calculate passes per 90 minutes
passes_per_90 <- round((total_passes / total_playtime_minutes) * 90)

# Define pitch dimensions
pitch_wyscout <- list(
  length = 100,
  width = 100,
  penalty_box_length = 16,
  penalty_box_width = 62,
  six_yard_box_length = 6,
  six_yard_box_width = 26,
  penalty_spot_distance = 10,
  goal_width = 12,
  origin_x = 0,
  origin_y = 0
)

create_wyscout_pitch <- function() {
  ggplot() +
    # Full pitch outline
    geom_rect(aes(xmin = pitch_wyscout$origin_x, xmax = pitch_wyscout$length,
                  ymin = pitch_wyscout$origin_y, ymax = pitch_wyscout$width), 
              color = "grey35", fill = NA) +
    
    # Penalty box (left)
    geom_rect(aes(xmin = pitch_wyscout$origin_x, 
                  xmax = pitch_wyscout$penalty_box_length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$penalty_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$penalty_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Penalty box (right)
    geom_rect(aes(xmin = pitch_wyscout$length - pitch_wyscout$penalty_box_length,
                  xmax = pitch_wyscout$length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$penalty_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$penalty_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Six-yard box (left)
    geom_rect(aes(xmin = pitch_wyscout$origin_x, 
                  xmax = pitch_wyscout$six_yard_box_length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$six_yard_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$six_yard_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Six-yard box (right)
    geom_rect(aes(xmin = pitch_wyscout$length - pitch_wyscout$six_yard_box_length,
                  xmax = pitch_wyscout$length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$six_yard_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$six_yard_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Center circle
    geom_point(aes(x = pitch_wyscout$length / 2, y = pitch_wyscout$width / 2), 
               size = 0.5, color = "grey35") +
    geom_circle(aes(x0 = pitch_wyscout$length / 2, y0 = pitch_wyscout$width / 2, r = 9.15),
                color = "grey35") +
    
    # Goals
    geom_segment(aes(x = pitch_wyscout$origin_x, 
                     xend = pitch_wyscout$origin_x, 
                     y = (pitch_wyscout$width - pitch_wyscout$goal_width) / 2, 
                     yend = (pitch_wyscout$width + pitch_wyscout$goal_width) / 2),
                 color = "grey35", size = 3) +
    geom_segment(aes(x = pitch_wyscout$length, 
                     xend = pitch_wyscout$length, 
                     y = (pitch_wyscout$width - pitch_wyscout$goal_width) / 2, 
                     yend = (pitch_wyscout$width + pitch_wyscout$goal_width) / 2),
                 color = "grey35", size = 3) +
    
    # Center line
    geom_segment(aes(x = pitch_wyscout$length / 2, xend = pitch_wyscout$length / 2,
                     y = pitch_wyscout$origin_y, yend = pitch_wyscout$width),
                 color = "grey35") +
    
     # Attack direction annotation
    annotate("segment", x = 45, xend = 80, y = 110, yend = 110, 
           arrow = arrow(length = unit(0.3, "inches")), color = "black", size = 1) +
    annotate("text", x = 60, y = 120, label = "Attacking Direction", size = 5, fontface = "bold") +
    
    # Adjust theme
    theme(
      panel.grid = element_blank(),
      panel.background = element_rect(fill = "white", color = NA),
      plot.background = element_rect(fill = "white", color = NA),
      axis.text = element_blank(),
      axis.ticks = element_blank(),
      axis.title = element_blank()
    ) +
    coord_fixed() +
    labs(title = "Football Pitch")
}

# Generate the subtitle with the record count
subtitle_text <- paste("Total successful progressive passes normalized per 90 minutes:", passes_per_90)

# Plot the passes with transparency and dark green color
create_wyscout_pitch() +
  geom_segment(data = croatia_data_filter, 
               aes(x = start_x, y = start_y, 
                   xend = end_x, yend = end_y), 
               lineend = "round", 
               linewidth = 0.6,  # Use linewidth for ggplot2 3.4.0+
               arrow = arrow(length = unit(0.08, "inches")),
               color = "#171796",  # Set color to dark green
               alpha = 0.5) +  # Adjust transparency (0 = fully transparent, 1 = fully opaque)
  labs(title = "Luka Modric Successful Progressive Passes in 2018 World Cup",
       subtitle = subtitle_text) + 
  coord_fixed(ratio = 60 / 100)

# Read the dataset
belgium_data <- read.csv("BelgiumTeamData.csv")

# Filter data to only player_id 38021 (Kevin De Bruyne)
belgium_data_filter <- belgium_data %>%
  filter(player_id == 38021 & event_name == "Pass" & grepl("1801", tags) & end_x > start_x & (abs(end_x - start_x) > 28 | (end_x > 75 & end_y > 20 & end_y < 80)))

# Calculate total passes
total_passes <- nrow(belgium_data_filter)

# Get total playtime (sum of max event time in each match for player_id 70123)
total_playtime_seconds <- belgium_data %>%
  filter(player_id == 38021) %>%
  group_by(match_id) %>%
  summarise(max_event_sec = max(event_sec, na.rm = TRUE)) %>%
  summarise(total_playtime = sum(max_event_sec, na.rm = TRUE)) %>%
  pull(total_playtime)

# Convert playtime to minutes
total_playtime_minutes <- total_playtime_seconds / 60

# Calculate passes per 90 minutes
passes_per_90 <- round((total_passes / total_playtime_minutes) * 90)

# Define pitch dimensions
pitch_wyscout <- list(
  length = 100,
  width = 100,
  penalty_box_length = 16,
  penalty_box_width = 62,
  six_yard_box_length = 6,
  six_yard_box_width = 26,
  penalty_spot_distance = 10,
  goal_width = 12,
  origin_x = 0,
  origin_y = 0
)

create_wyscout_pitch <- function() {
  ggplot() +
    # Full pitch outline
    geom_rect(aes(xmin = pitch_wyscout$origin_x, xmax = pitch_wyscout$length,
                  ymin = pitch_wyscout$origin_y, ymax = pitch_wyscout$width), 
              color = "grey35", fill = NA) +
    
    # Penalty box (left)
    geom_rect(aes(xmin = pitch_wyscout$origin_x, 
                  xmax = pitch_wyscout$penalty_box_length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$penalty_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$penalty_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Penalty box (right)
    geom_rect(aes(xmin = pitch_wyscout$length - pitch_wyscout$penalty_box_length,
                  xmax = pitch_wyscout$length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$penalty_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$penalty_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Six-yard box (left)
    geom_rect(aes(xmin = pitch_wyscout$origin_x, 
                  xmax = pitch_wyscout$six_yard_box_length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$six_yard_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$six_yard_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Six-yard box (right)
    geom_rect(aes(xmin = pitch_wyscout$length - pitch_wyscout$six_yard_box_length,
                  xmax = pitch_wyscout$length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$six_yard_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$six_yard_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Center circle
    geom_point(aes(x = pitch_wyscout$length / 2, y = pitch_wyscout$width / 2), 
               size = 0.5, color = "grey35") +
    geom_circle(aes(x0 = pitch_wyscout$length / 2, y0 = pitch_wyscout$width / 2, r = 9.15),
                color = "grey35") +
    
    # Goals
    geom_segment(aes(x = pitch_wyscout$origin_x, 
                     xend = pitch_wyscout$origin_x, 
                     y = (pitch_wyscout$width - pitch_wyscout$goal_width) / 2, 
                     yend = (pitch_wyscout$width + pitch_wyscout$goal_width) / 2),
                 color = "grey35", size = 3) +
    geom_segment(aes(x = pitch_wyscout$length, 
                     xend = pitch_wyscout$length, 
                     y = (pitch_wyscout$width - pitch_wyscout$goal_width) / 2, 
                     yend = (pitch_wyscout$width + pitch_wyscout$goal_width) / 2),
                 color = "grey35", size = 3) +
    
    # Center line
    geom_segment(aes(x = pitch_wyscout$length / 2, xend = pitch_wyscout$length / 2,
                     y = pitch_wyscout$origin_y, yend = pitch_wyscout$width),
                 color = "grey35") +
    
    # Attack direction annotation
    annotate("segment", x = 45, xend = 80, y = 110, yend = 110, 
           arrow = arrow(length = unit(0.3, "inches")), color = "black", size = 1) +
    annotate("text", x = 60, y = 120, label = "Attacking Direction", size = 5, fontface = "bold") +
    
    
    # Adjust theme
    theme(
      panel.grid = element_blank(),
      panel.background = element_rect(fill = "white", color = NA),
      plot.background = element_rect(fill = "white", color = NA),
      axis.text = element_blank(),
      axis.ticks = element_blank(),
      axis.title = element_blank()
    ) +
    coord_fixed() +
    labs(title = "Football Pitch")
}

# Generate the subtitle with the record count
subtitle_text <- paste("Total successful progressive passes normalized per 90 minutes:", passes_per_90)

# Plot the passes with transparency and dark green color
create_wyscout_pitch() +
  geom_segment(data = belgium_data_filter, 
               aes(x = start_x, y = start_y, 
                   xend = end_x, yend = end_y), 
               lineend = "round", 
               linewidth = 0.6,  # Use linewidth for ggplot2 3.4.0+
               arrow = arrow(length = unit(0.08, "inches")),
               color = "#FFCD00",  # Set color to dark green
               alpha = 0.5) +  # Adjust transparency (0 = fully transparent, 1 = fully opaque)
  labs(title = "Kevin De Bruyne Successful Progressive Passes in 2018 World Cup",
       subtitle = subtitle_text) + 
  coord_fixed(ratio = 60 / 100)

2.3 Passing in relation to location on the pitch using spatial autocorrelation

Spatial autocorrelation is used to describe the degree to which a variable is self-correlated across a spatial area. It has a strong connection to Tobler’s First Law of Geography, which states that “everything is related to everything else, but near things are more related than distant things” (Tobler).

Spatial autocorrelation in soccer passing shows how passes cluster around areas within the soccer pitch. Midfielders often act as the epicenter of a team’s passing network, as they receive and distribute passes from the defense to the offense. High spatial autocorrelation would mean passes are concentrated around these key players, while lower values suggest a more random dispersed passing pattern.

Performing Moran’s I test on Modric Passing Locations

One common indices that is used to assess spatial autocorrelation is Moran’s I test (Moran). We can test the presence of spatial autocorrelation using the Moran’s I, which quantifies how similar each region is with its neighbors and averages all these assessments (Moraga).

More information regarding Moran’s I test can be found here: https://www.paulamoraga.com/book-spatial/spatial-autocorrelation.html#global-morans-i

In this example, Luka Modric’s starting pass locations are tested using Moran’s I framework.

# Read the dataset
croatia_data <- read.csv("CroatiaTeamData.csv")

# Filter data to only player_id 8287 (Luka Modric)
croatia_data_filter <- croatia_data %>%
  filter(player_id == 8287 & event_name == "Pass")

# Count number of passes at each location
croatia_data_aggregated <- croatia_data_filter %>%
  group_by(start_x, start_y) %>%
  summarise(pass_count = n())

# Convert to sf object
croatia_sf_pass <- st_as_sf(croatia_data_aggregated, coords = c("start_x", "start_y"), crs = 4326)

# Create neighbors
nb <- dnearneigh(st_coordinates(croatia_sf_pass), 0, 30)  
lw <- nb2listw(nb, style = "W", zero.policy = TRUE)

# Perform Moran's I test on pass count
moran_test_count <- moran.test(croatia_data_aggregated$pass_count, lw, zero.policy = TRUE)

# Print results 
print(moran_test_count)
## 
##  Moran I test under randomisation
## 
## data:  croatia_data_aggregated$pass_count  
## weights: lw    
## 
## Moran I statistic standard deviate = 2.5981, p-value = 0.004687
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      9.465372e-03     -2.309469e-03      2.053928e-05

Interpretation of the data

Since p < 0.05, we reject the null hypothesis (which assumes passes are randomly distributed). This means that certain areas of the field see more passing activity.

Performing Moran’s I test on Kevin De Bruyne Passing Locations

Moran’s I test can be applied to other players, such as Kevin De Bruyne.

# Read the dataset
belgium_data <- read.csv("BelgiumTeamData.csv")

# Filter data to only player_id 38021 (Kevin De Bruyne)
belgium_data_filter <- belgium_data %>%
  filter(player_id == 38021 & event_name == "Pass")

# Count number of passes at each location
belgium_data_aggregated <- belgium_data_filter %>%
  group_by(start_x, start_y) %>%
  summarise(pass_count = n())

# Convert to sf object
belgium_sf_pass <- st_as_sf(belgium_data_aggregated, coords = c("start_x", "start_y"))

# Create neighbors
nb <- dnearneigh(st_coordinates(belgium_sf_pass), 0, 30)  
lw <- nb2listw(nb, style = "W", zero.policy = TRUE)

# Perform Moran's I test on pass count
beligum_moran_test_count <- moran.test(belgium_data_aggregated$pass_count, lw, zero.policy = TRUE)
# Print results 
print(beligum_moran_test_count)
## 
##  Moran I test under randomisation
## 
## data:  belgium_data_aggregated$pass_count  
## weights: lw    
## 
## Moran I statistic standard deviate = 1.383, p-value = 0.08334
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      5.129646e-03     -3.367003e-03      3.774605e-05

Since p > 0.05, we fail to reject the null hypothesis (which assumes passes are not randomly distributed). This means that there is no significant spatial autocorrelation.

Performing Moran’s I test on N’Golo Kante Passing Locations

Moran’s I test can be applied to other players, such as N’Golo Kante.

# Read the dataset
france_data <- read.csv("FranceTeamData.csv")

# Filter data to only player_id 31528 (N'Golo Kante)
france_data_filter <- france_data %>%
  filter(player_id == 31528 & event_name == "Pass")

# Count number of passes at each location
france_data_aggregated <- france_data_filter %>%
  group_by(start_x, start_y) %>%
  summarise(pass_count = n())

# Convert to sf object
france_sf_pass <- st_as_sf(france_data_aggregated, coords = c("start_x", "start_y"))

# Create neighbors
nb <- dnearneigh(st_coordinates(france_sf_pass), 0, 30)  
lw <- nb2listw(nb, style = "W", zero.policy = TRUE)

# Perform Moran's I test on pass count
france_moran_test_count <- moran.test(france_data_aggregated$pass_count, lw, zero.policy = TRUE)
# Print results 
print(france_moran_test_count)
## 
##  Moran I test under randomisation
## 
## data:  france_data_aggregated$pass_count  
## weights: lw    
## 
## Moran I statistic standard deviate = 0.14243, p-value = 0.4434
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##     -2.112137e-03     -2.906977e-03      3.114345e-05

Comparison Summary

De Bruyne shows some clustering, but it is not statistically significant, suggesting flexibility in his playmaking.

Kanté’s passes are the most dispersed, reflecting his role as a defensive midfielder who distributes the ball widely rather than concentrating play in one area.

Modrić’s passing distribution is the most structured (highest Moran’s I, p < 0.01), meaning he tends to pass within specific zones rather than spreading the ball randomly.

Let’s look at Modric’s passing distribution via a heat map to see where his passes are concentrated.

Modric Passing Heatmap

# Define pitch dimensions
pitch_length <- 100
pitch_width <- 100
grid_size <- 10
penalty_box_length <- 16
penalty_box_width <- 62
six_yard_box_length <- 6
six_yard_box_width <- 26
goal_width <- 12

# Create grid bins
df <- croatia_data_filter %>%
  mutate(grid_x = floor(start_x / grid_size),
         grid_y = floor(start_y / grid_size))

# Aggregate counts per grid cell
heatmap_data <- df %>%
  group_by(grid_x, grid_y) %>%
  summarise(event_count = n()) %>%
  filter(event_count>7)

# Calculate the average start_x and start_y
average_start_x <- mean(croatia_data_filter$start_x, na.rm = TRUE)
average_start_y <- mean(croatia_data_filter$start_y, na.rm = TRUE)

# Create a dataframe for the average start position point
average_point <- data.frame(
  x = average_start_x,
  y = average_start_y,
  label = "Avg Start Position"
)

# Start with the heatmap in the background
heatmap_plot <- ggplot() +
  geom_tile(data = heatmap_data, aes(x = grid_x * grid_size + grid_size / 2, 
                                     y = grid_y * grid_size + grid_size / 2, 
                                     fill = event_count), color = "white") +
  scale_fill_gradient(low = "#ADD8E6", high = "#171796", labels = scales::label_number(accuracy = 1)) +
  labs(title = "Heatmap Luka Modric Starting Passes",
       subtitle = "Darker regions represent more passes performed per grid",
       fill = "Pass Count") +
  theme_classic() +
  theme(axis.text = element_blank(), 
        axis.ticks = element_blank(), 
        axis.title = element_blank(),
        panel.grid = element_blank()) +
  
  # Full pitch outline
  geom_rect(aes(xmin = 0, xmax = pitch_length, ymin = 0, ymax = pitch_width), 
            color = "grey", fill = NA) +
  
  # Grid lines (10x10)
  geom_segment(data = data.frame(x = seq(0, pitch_length, by = grid_size)),
               aes(x = x, xend = x, y = 0, yend = pitch_width),
               color = "lightgrey", linetype = "dotted") +
  geom_segment(data = data.frame(y = seq(0, pitch_width, by = grid_size)),
               aes(x = 0, xend = pitch_length, y = y, yend = y),
               color = "lightgrey", linetype = "dotted") +
  
  # Penalty box (left side)
  geom_rect(aes(xmin = 0, xmax = penalty_box_length,
                ymin = (pitch_width - penalty_box_width) / 2,
                ymax = (pitch_width + penalty_box_width) / 2),
            color = "grey35", fill = NA) +
  
  # Penalty box (right side)
  geom_rect(aes(xmin = pitch_length - penalty_box_length, xmax = pitch_length,
                ymin = (pitch_width - penalty_box_width) / 2,
                ymax = (pitch_width + penalty_box_width) / 2),
            color = "grey35", fill = NA) +
  
  # Six-yard box (left side)
  geom_rect(aes(xmin = 0, xmax = six_yard_box_length,
                ymin = (pitch_width - six_yard_box_width) / 2,
                ymax = (pitch_width + six_yard_box_width) / 2),
            color = "grey35", fill = NA) +
  
  # Six-yard box (right side)
  geom_rect(aes(xmin = pitch_length - six_yard_box_length, xmax = pitch_length,
                ymin = (pitch_width - six_yard_box_width) / 2,
                ymax = (pitch_width + six_yard_box_width) / 2),
            color = "grey35", fill = NA) +
  
  # Goals (left side)
  geom_segment(aes(x = 0, xend = 0, 
                   y = (pitch_width - goal_width) / 2, 
                   yend = (pitch_width + goal_width) / 2),
               color = "black", size = 3) +
  
  # Goals (right side)
  geom_segment(aes(x = pitch_length, xend = pitch_length, 
                   y = (pitch_width - goal_width) / 2, 
                   yend = (pitch_width + goal_width) / 2),
               color = "black", size = 3) +
  
  # Center circle
  geom_circle(aes(x0 = pitch_length / 2, y0 = pitch_width / 2, r = 9.15),
              color = "grey35", inherit.aes = FALSE) +
  
  # Center line
  geom_segment(aes(x = pitch_length / 2, xend = pitch_length / 2,
                   y = 0, yend = pitch_width),
               color = "grey35") +
  
  # Attack direction annotation
  annotate("segment", x = 45, xend = 80, y = 110, yend = 110, 
           arrow = arrow(length = unit(0.3, "inches")), color = "black", size = 1.2) +
  annotate("text", x = 60, y = 120, label = "Attacking Direction", size = 5, fontface = "bold") +
  
  # Add the average start_x and start_y as a black point
  geom_point(aes(x = average_start_x, y = average_start_y), color = "white",, fill = "black", size = 7, shape = 21, stroke =1.5) +
  
  # Add text annotation above the average point
  geom_text(data = average_point, aes(x = x, y = y + 6, label = "Average Pass Start Location"), 
            color = "black", size = 3, hjust = 0.5) +
  
  coord_fixed()

# Show the final plot
print(heatmap_plot)

When comparing the heatmap and average starting pass location to the starting positional lineup, we can see that Modric tends to pass around the location he is assigned. This highlights his exceptional positional awareness and ability to follow tactical plans throughout a soccer match.

Luka Modric Starting Position in Lineup.
Luka Modric Starting Position in Lineup.

Part 3: Defending

A great midfielder isn’t just good at passing—they also help their team by stopping the opponent’s attacks. Two key defensive metrics for gauging defensive contributions are tackles and interceptions. A midfielder with high numbers in these areas can break up attacks, regain possession, and keep their team in control of the game. Players who excel at this are crucial for protecting their defense and starting counterattacks.

3.1 Tackle/Interception

Tackle/Interception Explained

A tackle/interception are defensive actions in soccer where the player stops the opposition from dribbling, passing, or shooting the ball. In the example below, Luka Modric intercepts the ball as he stopped the pass of his opponent.

Tackle/Interception example.
Tackle/Interception example.

Luka Modric vs N’Golo Kanté Defensive Contributions

# Load the dataset
croatiadata <- read_csv(here("CroatiaTeamData.csv"))

# Filter Luka Modric's events (Duel + Others on the ball)
modric <- croatiadata %>%
  filter(player_id == 8287 & (event_name == "Duel" | event_name == "Others on the ball"))

# Define pitch dimensions
pitch_length <- 100
pitch_width <- 100
penalty_box_length <- 16
penalty_box_width <- 62
six_yard_box_length <- 6
six_yard_box_width <- 26
goal_width <- 12

# Define 18-zone grid (3 vertical x 6 horizontal)
num_vertical <- 3
num_horizontal <- 6
grid_x_size <- pitch_length / num_horizontal
grid_y_size <- pitch_width / num_vertical

# Assign events to the 18-zone grid
df <- modric %>%
  mutate(grid_x = floor(start_x / grid_x_size),
         grid_y = floor(start_y / grid_y_size))

# Aggregate total contributions per grid cell
heatmap_data <- df %>%
  group_by(grid_x, grid_y) %>%
  summarise(event_count = n(), .groups = "drop")

# Generate the heatmap with contribution counts in each zone
modric_heatmap_plot <- ggplot() +
  geom_tile(data = heatmap_data, aes(x = grid_x * grid_x_size + grid_x_size / 2, 
                                     y = grid_y * grid_y_size + grid_y_size / 2, 
                                     fill = event_count), color = "white") +
  scale_fill_gradient(low = "#ADD8E6", high = "#171796", labels = scales::label_number(accuracy = 1)) +
  labs(title = "Luka Modric Defensive Contributions",
       subtitle = "Darker regions indicate more contributions",
       fill = "Contributions") +
  theme_classic() +
  theme(axis.text = element_blank(), 
        axis.ticks = element_blank(), 
        axis.title = element_blank(),
        panel.grid = element_blank()) +
  
  # Full pitch outline
  geom_rect(aes(xmin = 0, xmax = pitch_length, ymin = 0, ymax = pitch_width), 
            color = "grey", fill = NA) +
  
  # Add 18 equal-area grid lines (3x6)
  geom_segment(data = data.frame(x = seq(0, pitch_length, by = grid_x_size)),
               aes(x = x, xend = x, y = 0, yend = pitch_width),
               color = "grey", linetype = "dashed") +
  geom_segment(data = data.frame(y = seq(0, pitch_width, by = grid_y_size)),
               aes(x = 0, xend = pitch_length, y = y, yend = y),
               color = "grey", linetype = "dashed") +
  
  # Add event count labels in each zone
  geom_text(data = heatmap_data, aes(x = grid_x * grid_x_size + grid_x_size / 2, 
                                     y = grid_y * grid_y_size + grid_y_size / 2, 
                                     label = event_count), size = 5, fontface = "bold") +
  
  # Add penalty boxes
  geom_rect(aes(xmin = 0, xmax = penalty_box_length,
                ymin = (pitch_width - penalty_box_width) / 2,
                ymax = (pitch_width + penalty_box_width) / 2),
            color = "grey35", fill = NA) +
  geom_rect(aes(xmin = pitch_length - penalty_box_length, xmax = pitch_length,
                ymin = (pitch_width - penalty_box_width) / 2,
                ymax = (pitch_width + penalty_box_width) / 2),
            color = "grey35", fill = NA) +
  
  # Add six-yard boxes
  geom_rect(aes(xmin = 0, xmax = six_yard_box_length,
                ymin = (pitch_width - six_yard_box_width) / 2,
                ymax = (pitch_width + six_yard_box_width) / 2),
            color = "grey35", fill = NA) +
  geom_rect(aes(xmin = pitch_length - six_yard_box_length, xmax = pitch_length,
                ymin = (pitch_width - six_yard_box_width) / 2,
                ymax = (pitch_width + six_yard_box_width) / 2),
            color = "grey35", fill = NA) +
  
  # Add goals
  geom_segment(aes(x = 0, xend = 0, 
                   y = (pitch_width - goal_width) / 2, 
                   yend = (pitch_width + goal_width) / 2),
               color = "black", size = 3) +
  geom_segment(aes(x = pitch_length, xend = pitch_length, 
                   y = (pitch_width - goal_width) / 2, 
                   yend = (pitch_width + goal_width) / 2),
               color = "black", size = 3) +
  
  # Add center circle and line
  geom_circle(aes(x0 = pitch_length / 2, y0 = pitch_width / 2, r = 9.15),
              color = "grey35", inherit.aes = FALSE) +
  geom_segment(aes(x = pitch_length / 2, xend = pitch_length / 2,
                   y = 0, yend = pitch_width),
               color = "grey35") +
  
  # Attack direction annotation
  annotate("segment", x = 45, xend = 80, y = 110, yend = 110, 
           arrow = arrow(length = unit(0.3, "inches")), color = "black", size = 1.2) +
  annotate("text", x = 60, y = 120, label = "Attacking Direction", size = 5, fontface = "bold") +
  
  coord_fixed()
# Load the dataset
francedata <- read_csv(here("FranceTeamData.csv"))

kante <- read_csv(here("kante.csv"))
# Filter N'Golo Kante's events (Duel + Others on the ball)
#kante <- francedata %>%
  #filter(player_id == 31528 & (event_name == "Duel" | event_name == "Others on the ball"))

#write.csv(kante, "kante.csv")

# Define pitch dimensions
pitch_length <- 100
pitch_width <- 100
penalty_box_length <- 16
penalty_box_width <- 62
six_yard_box_length <- 6
six_yard_box_width <- 26
goal_width <- 12

# Define 18-zone grid (3 vertical x 6 horizontal)
num_vertical <- 3
num_horizontal <- 6
grid_x_size <- pitch_length / num_horizontal
grid_y_size <- pitch_width / num_vertical

# Assign events to the 18-zone grid
df <- kante %>%
  mutate(grid_x = floor(start_x / grid_x_size),
         grid_y = floor(start_y / grid_y_size))

# Aggregate total contributions per grid cell
heatmap_data <- df %>%
  group_by(grid_x, grid_y) %>%
  summarise(event_count = n(), .groups = "drop")

# Generate the heatmap with contribution counts in each zone
kante_heatmap_plot <- ggplot() +
  geom_tile(data = heatmap_data, aes(x = grid_x * grid_x_size + grid_x_size / 2, 
                                     y = grid_y * grid_y_size + grid_y_size / 2, 
                                     fill = event_count), color = "white") +
  scale_fill_gradient(low = "#FFCCCB", high = "#E1000F", labels = scales::label_number(accuracy = 1)) +
  labs(title = "N'Golo Kante Defensive Contributions",
       subtitle = "Darker regions indicate more contributions",
       fill = "Contributions") +
  theme_classic() +
  theme(axis.text = element_blank(), 
        axis.ticks = element_blank(), 
        axis.title = element_blank(),
        panel.grid = element_blank()) +
  
  # Full pitch outline
  geom_rect(aes(xmin = 0, xmax = pitch_length, ymin = 0, ymax = pitch_width), 
            color = "grey", fill = NA) +
  
  # Add 18 equal-area grid lines (3x6)
  geom_segment(data = data.frame(x = seq(0, pitch_length, by = grid_x_size)),
               aes(x = x, xend = x, y = 0, yend = pitch_width),
               color = "grey", linetype = "dashed") +
  geom_segment(data = data.frame(y = seq(0, pitch_width, by = grid_y_size)),
               aes(x = 0, xend = pitch_length, y = y, yend = y),
               color = "grey", linetype = "dashed") +
  
  # Add event count labels in each zone
  geom_text(data = heatmap_data, aes(x = grid_x * grid_x_size + grid_x_size / 2, 
                                     y = grid_y * grid_y_size + grid_y_size / 2, 
                                     label = event_count), size = 5, fontface = "bold") +
  
  # Add penalty boxes
  geom_rect(aes(xmin = 0, xmax = penalty_box_length,
                ymin = (pitch_width - penalty_box_width) / 2,
                ymax = (pitch_width + penalty_box_width) / 2),
            color = "grey35", fill = NA) +
  geom_rect(aes(xmin = pitch_length - penalty_box_length, xmax = pitch_length,
                ymin = (pitch_width - penalty_box_width) / 2,
                ymax = (pitch_width + penalty_box_width) / 2),
            color = "grey35", fill = NA) +
  
  # Add six-yard boxes
  geom_rect(aes(xmin = 0, xmax = six_yard_box_length,
                ymin = (pitch_width - six_yard_box_width) / 2,
                ymax = (pitch_width + six_yard_box_width) / 2),
            color = "grey35", fill = NA) +
  geom_rect(aes(xmin = pitch_length - six_yard_box_length, xmax = pitch_length,
                ymin = (pitch_width - six_yard_box_width) / 2,
                ymax = (pitch_width + six_yard_box_width) / 2),
            color = "grey35", fill = NA) +
  
  # Add goals
  geom_segment(aes(x = 0, xend = 0, 
                   y = (pitch_width - goal_width) / 2, 
                   yend = (pitch_width + goal_width) / 2),
               color = "black", size = 3) +
  geom_segment(aes(x = pitch_length, xend = pitch_length, 
                   y = (pitch_width - goal_width) / 2, 
                   yend = (pitch_width + goal_width) / 2),
               color = "black", size = 3) +
  
  # Add center circle and line
  geom_circle(aes(x0 = pitch_length / 2, y0 = pitch_width / 2, r = 9.15),
              color = "grey35", inherit.aes = FALSE) +
  geom_segment(aes(x = pitch_length / 2, xend = pitch_length / 2,
                   y = 0, yend = pitch_width),
               color = "grey35") +
  
  # Attack direction annotation
  annotate("segment", x = 45, xend = 80, y = 110, yend = 110, 
           arrow = arrow(length = unit(0.3, "inches")), color = "black", size = 1.2) +
  annotate("text", x = 60, y = 120, label = "Attacking Direction", size = 5, fontface = "bold") +
  
  coord_fixed()
pl_list <- list()

pl1 <- modric_heatmap_plot
pl2 <- kante_heatmap_plot

pl_list[[1]] <- list(pl1,  "modric defensive actions")
pl_list[[2]] <- list(pl2,  "kante defensive actions")

Modric vs Kante Defensive Actions Plotted

# Function to generate tab headers dynamically
catHeader <- function(text = "", level = 3) {
    cat(paste0("\n\n", paste(rep("#", level), collapse = ""), " ", text, "\n\n"))
}

# Iterate over each player's plot and create a tab
for(i in seq_along(pl_list)){
    tmp <- pl_list[[i]]  # Extract the player plot and name
    
    catHeader(tmp[[2]], 3)  # Create a tab with the player's name (### for level 3 headers)
    
    print(tmp[[1]])  # Display the corresponding heatmap
}

modric defensive actions

kante defensive actions

Part 4: Goal Creating Actions

A great midfielder not only controls the game but also helps create goals. Shot-creating actions and goal-creating actions are two key metrics to analyze a player’s impact in a game. A midfielder with high numbers in these areas is key to breaking down defenses and making their team more dangerous in attack. These stats show how well a player contributes to scoring chances and overall team success.

Type 1: Goal (Explained)

A goal is scored when a player kicks the ball into the opponent’s goal. In the example below, Luka Modric shoots and scores against Argentina in the World Cup.

Luka Modric Goal.
Luka Modric Goal.

Type 2: Assist (Explained)

An assist is when a player passes the ball that leads directly to the goal. In the example below, Kevin De Bruyne passes the ball to Romelu Lukaku who scored, leading to an assist.

Kevin De Bruyne Assist.
Kevin De Bruyne Assist.

Type 3: Secondary Assist (Explained)

A secondary assist is the pass that directly leads to the assist, which then sets up the goal.

Kevin De Bruyne Assist.
Kevin De Bruyne Assist.

4.1 Shot Creating Actions

Text

4.2 Goal Creating Actions

Text

Final Results

References

Moraga, P. (n.d.). Spatial autocorrelation. Retrieved [2/25/25], from https://www.paulamoraga.com/book-spatial/spatial-autocorrelation.html#global-morans-i

Moran, P. A. P. 1950. “Notes on Continuous Stochastic Phenomena.” Biometrika 37 (1/2): 17–23. https://doi.org/https://doi.org/10.2307/2332142.

Tobler, W. R. 1970. “A Computer Movie Simulating Urban Growth in the Detroit Region.” Economic Geography 46: 234–40. https://doi.org/https://doi.org/10.2307/143141.

Wyscout. (n.d.). Data glossary. Retrieved March 3, 2025, from https://dataglossary.wyscout.com/