Abstract

The 2018 FIFA World Cup highlighted exceptional midfield performances, with Luka Modrić earning the Golden Ball as the tournament’s most valuable player, as voted by media representatives and journalists (Younas). With the introduction of statistical analysis, new methodologies can be applied to objectively compare soccer players across a range of surface-level statistics and in-depth spatial analysis methodologies. This study conducts a statistical analysis comparing Luka Modrić, N’Golo Kanté, and Kevin De Bruyne, three key midfielders of the first, second, and third place countries in the 2018 World Cup. Using open-source, location-based datasets, this research applies spatial analytics and data visualization techniques to assess key performance indicators, including passing and defensive tackling actions. Methodologies such as point density analysis, flow mapping, and spatial autocorrelation are employed to analyze passing and defensive tackling actions. Data processing is conducted using R Studio and spatial analytics tools to provide an objective evaluation of midfield effectiveness. The results aim to demonstrate the impact of advanced spatial statistics in player assessment, offering a data-driven approach to evaluating midfield contributions beyond conventional metrics.

Introduction

The 2018 FIFA World Cup saw Croatia’s Luka Modrić awarded the Golden Ball as the tournament’s most valuable player, recognizing his overall influence on Croatia’s first-ever World Cup final appearance. While his performances were widely acknowledged as critical to the team’s success, traditional statistical metrics—such as goals and assists—have led to debates about whether Modrić was the most statistically dominant player of the tournament. For instance, France’s Antoine Griezmann registered more direct goal contributions, leading some to question Modrić’s selection. Soccer analysis has historically relied on fundamental statistics such as possession percentage, total passes, and shots on goal to evaluate player performance. However, these metrics do not capture deeper tactical insights, particularly in midfield roles where impact extends beyond direct goal involvement. The growing accessibility of open-source, location-based datasets has enabled the application of spatial analytics in soccer analysis, providing a more detailed understanding of player influence through advanced visualization techniques. This study employs spatial analytics and data visualization to assess Modrić’s performances in comparison to two other key midfielders from the 2018 World Cup: Belgium’s Kevin De Bruyne and France’s N’Golo Kanté. While the growth of statistics has grown tremendously in the past two decades, statistical metrics in soccer have not changed significantly and provide a narrow view of a player’s impact in a match or across a tournament. Using statistical computing tools such as R Studio, this analysis will focus on unconventional metrics derived from spatial data, including analysis on passing distribution and defensive actions. Methodologies such as point density analysis, flow mapping, and spatial autocorrelation are employed to develop hidden trends beyond surface-level statistics that many media journalists consult when voting for awards including the Golden Ball. By applying these advanced techniques, this study aims to provide an objective assessment of midfield effectiveness and evaluate Modrić’s statistical case for the Golden Ball beyond conventional basic statistical metrics.

Data

Soccer analytics and location-based datasets are gaining interest in both the proprietary and open-source worlds.

Several large open collections of soccer-logs are available to the public containing spatio-temporal events that occurred during each match of prominent soccer competitions, including the world cup (Pappalardo, Cinita, Rossi, et al.).

These spatio-temporal events containing major actions that occur within the duration of a soccer match. Information including action type (pass, shot, foul, tackle, etc.), a time-stamp, the player(s), the position on the field using an x and y coordinate, and tags for additional information are included (Pappalardo, Cinita, Rossi, et al.).

The spatio-temporal dataset used in this report contained hundreds of thousands of data records, each with distinct actions performed across the 64 matches played in the 2018 World Cup.

The actions within the dataset were plotted in the open-source software R Studio for visualization using the x/y coordinates as the geometry and the additional attribute information allowing for analysis and visualization of the geometry data.

Open-source data can be found here: https://figshare.com/collections/Soccer_match_event_dataset/4415000/2

Original data source, including open source and proprietary data, can be found here: https://wyscout.com/

Methodology

Data Pre-processing

The spatial data utilized in this report was presented in a csv format containing fields such as a time-stamp, start x, start y, end x, end y, and the players involved. Additionally, the original file contained hundreds of thousands of data records, each with distinct actions performed across the 64 matches played in the 2018 World Cup. Using the tidyverse package in R Studio, the data was processed to filter for specific players and teams. This allowed for easier processing improved performance within R Studio.

Additionally, pre-processing of the the x/y coordinates was necessary as the original data was not easily readable in R studio. Figure 1 below shows the before/after of the processing of the raw dataset to the processed dataset used in the analysis and results for this report.

Figure 1. Data cleanup format necessary to plot x/y data for analytic purposes in R Studio.

Metrics Used

Several metrics will be used to highlight surface level statistics and in-depth spatial analysis for players in the 2018 World Cup.

This includes plotting the various actions performed within the dimensions of a soccer pitch. The following plots were created to provide an objective analysis of key actions that occur during a soccer match:

####Part 1: Initial Comparison

• Radar Plot highlighting surface-level statistics to gauge passing, goal-scoring, and defending metrics using Z-scores to objectively assess all 208 midfielders that played at least one game in the 2018 world cup.

####Part 2: Passing

• K-cluster Pass Network highlighting start/end locations of passes in a game to gauge differences in the passing network distribution of each of the players analyzed.

• Spatial Autocorrelation to analyze the spatial distribution of starting passing locations in a match to gauge the positioning discipline of each of the players analyzed.

####Part 3: Defensive Actions

• Point density analysis heatmap of aggregated defensive tackles within a specified grid zone across the soccer pitch to gauge the defensive contributions performed by the players analyzed.

Analysis & Methods

After processing of the data, the ggplot packages, including ggsoccer and ggforce were used to map the actions on a visual soccer pitch, using the following coordinates to ensure the data will be consistent throughout all soccer games, as shown in Figure 2.

Figure 2. The spatial distribution of coordinates that makeup the soccer pitch outline that allow the x/y data to be plotted on a soccer pitch (Wyscout Pitch Coordinates).

First, pitch dimensions are defined to create a pitch with necessary visual context cues, including the penalty box, the six-yard box, and the goal.

pitch_wyscout <- list(
  length = 100,
  width = 100,
  penalty_box_length = 16,
  penalty_box_width = 62,
  six_yard_box_length = 6,
  six_yard_box_width = 26,
  penalty_spot_distance = 10,
  goal_width = 12,
  origin_x = 0,
  origin_y = 0
)

Then, a pitch is created via a function using ggplot package to create a basic pitch within R Studio using Wyscout dimensions, where the source data comes from. This will allow us to plot the various actions that happen within the match using the various geometries, such as the pitch outline rectangle shown below, as a visual reference to aid in visual understanding.

create_wyscout_pitch <- function() {
  ggplot() +
    # Full pitch outline
    geom_rect(aes(xmin = pitch_wyscout$origin_x, xmax = pitch_wyscout$length,
                  ymin = pitch_wyscout$origin_y, ymax = pitch_wyscout$width), 
              color = "grey35", fill = NA) }

This project aims to analyze three of the top midfielders at the 2018 World Cup across specific metrics highlighting their passing, defensive, and goal scoring actions.

As shown in Figure 3 below, three were selected due to their standout performances leading their respective nations to a first (France), second (Croatia), and third (Belgium) place, respectively.

Figure 3. Three players analyzed based on 2018 World Cup performances across 7 matches.

Part 1: Initial Results

Initial surface level statistics were analyzed for the three players based on the following metrics:

• Completed Passes (Cmp) • Progressive Passes (PrgP)

• Shot-Creating Actions (SCA) • Goal-Creating Actions (GCA)

• Tackles (Tkl) • Interceptions (Int)

Z scores were generated and utilized for each of the metrics on a per 90 minute normalization across every midfielder that played in the 2018 world cup.

Z-scores are used in soccer statistics to standardize different variables, such as total minutes played in the tournament, making it comparable across all players in the tournament.

Since stats like passes, tackles, and goal contributions exist on different scales, Z-scores show how much a player’s performance deviates from the average. This helps compare players with different roles, identify statistical outliers, and account for differences in playing time.

A Z-score measures how far a player’s statistic is from the average, in terms of standard deviations. A Z-score of 0 means the player is exactly average, while a positive Z-score indicates above-average performance, and a negative Z-score indicates below-average performance.

For example, if a player has a Z-score of +2.0 in key passes, it means he is two standard deviations above the average midfielder, making him an elite passer. Conversely, a Z-score of -1.5 in tackles would mean he performs below the average in that category.

Part 2: Passing Results

For the passing results, k-clustering and spatial autocorrelation were used.

K-means clustering is a machine learning technique used to group data points into clusters based on similarity. It works by randomly selecting cluster centers (centroids) and assigning each data point to the nearest one. The algorithm then recalculates the centroids based on the average of points in each cluster and repeats this process until the clusters stabilize (Kavlakoglu).

For the analysis, eight cluster groups were chosen because the elbow method indicated that increasing the number of clusters beyond eight resulted in diminishing returns. The elbow method works by plotting the sum of squared errors (SSE) against the number of clusters, creating a curve. As the number of clusters increases, the SSE decreases, but after a certain point (the “elbow”), further increases result in minimal improvement. In this case, the elbow appeared at eight clusters, indicating an optimal balance between simplicity and accuracy (R Core Team).

Spatial autocorrelation is used to describe the degree to which a variable is self-correlated across a spatial area. It helps determine whether similar values cluster together or are dispersed randomly (Moraga). It has a strong connection to Tobler’s First Law of Geography, which states that “everything is related to everything else, but near things are more related than distant things” (Tobler).

Global Moran’s I is a tool that falls under the spatial autocorrelation domain. Moran’s I test helps identify whether similar values tend to cluster or spread out randomly. A positive Moran’s I value means similar values cluster together, a negative value means similar values are spread out, and a value close to zero indicates a random pattern (Moran). To see if the result is significant, the calculated Moran’s I is compared to what would be expected by chance.

In this project, Moran’s I statistic will be used to analyze the starting pass locations for each of the top midfielders in the 2018 World Cup to gauge the positioning discipline of each of the players.

Part 3: Defensive Actions

For the defensive action results, point density analysis was used.

Point pattern analysis (PPA) is the study of the spatial arrangements of points within a given area, aiming to understand the underlying processes influencing these patterns. One fundamental aspect of PPA is the assessment of point density, which measures how points are distributed across a study region (Gimond).

Global density provides an overall measure of how points are distributed throughout the entire study area. It is calculated by dividing the total number of observed points by the area of the study region (Gimond).

In the case for defensive actions, the location of defensive tackles can be analyzed within different zones around the pitch. For this analysis, 18 equal zones were created following the similar method derived from a study conducted at John Moores University in Liverpool, UK (Smarter Soccer).

This segmentation of the field into 18 zones, starting from the defensive third and moving forward to the attacking third, allows players and coaches to better understand and strategize around the pitch (Smarter Soccer).

Libraries Used

The following libraries were used in R Studio to complete the analysis and generate the visualizations shown in the results.

library(readr)
library(dplyr)
library(gridExtra)
library(cowplot)
library(ggplot2)
library(gridExtra)
library(kableExtra)
library(magick)
library(tidyverse)
library(here)
library(janitor)
library(fmsb)
library(ggforce)
library(ggsoccer)
library(scales)
library(knitr)
library(sf)
library(spdep)
library(igraph)

Results

Part 1: Initial Comparison of Top Midfielders

1.1 Z Score Table

As mentioned in the analysis & methods section, Z scores were used to analyze the following table of surface-level soccer statistics to compare the three players among all midfielders playing in the World Cup.

• Completed Passes (Cmp) • Progressive Passes (PrgP)

• Shot-Creating Actions (SCA) • Goal-Creating Actions (GCA)

• Tackles (Tkl) • Interceptions (Int)

# Load the Passing Data
file1 <- "All_MF_Pass.csv"
passdata <- read_csv(file1)

# Clean and compute Z-scores for Passing Data
passdata <- passdata %>%
  filter(!is.na(PrgP) & !is.na(Cmp_1)) %>%  # Remove NA values
  mutate(
    PrgP = as.numeric(PrgP),
    Cmp_1 = as.numeric(Cmp_1),
    PrgP_Z = (PrgP - mean(PrgP, na.rm = TRUE)) / sd(PrgP, na.rm = TRUE),
    Cmp_Z = (Cmp_1 - mean(Cmp_1, na.rm = TRUE)) / sd(Cmp_1, na.rm = TRUE)
  ) %>%
  select(Player, PrgP, PrgP_Z, Cmp_1, Cmp_Z)

# Load the Shooting Data
file2 <- "All_MF_Shoot.csv"
shootdata <- read_csv(file2)

# Clean and compute Z-scores for Shooting Data
shootdata <- shootdata %>%
  filter(!is.na(GCA) & !is.na(SCA)) %>%  # Remove NA values
  mutate(
    GCA = as.numeric(GCA),
    SCA = as.numeric(SCA),
    GCA_Z = (GCA - mean(GCA, na.rm = TRUE)) / sd(GCA, na.rm = TRUE),
    SCA_Z = (SCA - mean(SCA, na.rm = TRUE)) / sd(SCA, na.rm = TRUE)
  ) %>%
  select(Player, GCA, GCA_Z, SCA, SCA_Z)

# Load the Defending Data
file3 <- "All_MF_Def.csv"
defdata <- read_csv(file3)

# Clean and compute Z-scores for Defending Data
defdata <- defdata %>%
  filter(!is.na(Tkl_1) & !is.na(Int)) %>%  # Remove NA values
  mutate(
    Tkl_1 = as.numeric(Tkl_1),
    Int = as.numeric(Int),
    Tkl_1_Z = (Tkl_1 - mean(Tkl_1, na.rm = TRUE)) / sd(Tkl_1, na.rm = TRUE),
    Int_Z = (Int - mean(Int, na.rm = TRUE)) / sd(Int, na.rm = TRUE)
  ) %>%
  select(Player, Tkl_1, Tkl_1_Z, Int, Int_Z)

# Merge all datasets on Player
combined_data <- passdata %>%
  full_join(shootdata, by = "Player") %>%
  full_join(defdata, by = "Player")

# Filter for specific players and handle potential inconsistencies in Player names
player_values <- c("Luka Modric", "Kevin De Bruyne", "N'Golo Kante")
final_table <- combined_data %>%
  mutate(Player = trimws(Player)) %>%  # Remove any extra spaces
  filter(Player %in% player_values)

# Select only columns with "Z" in their name (Z-scores)
final_table_z <- final_table %>%
  select(Player, contains("Z"))

# Identify numeric Z-score columns (excluding Player)
z_columns <- names(final_table_z)[names(final_table_z) != "Player"]

# Create Max row (set all Z-score columns to 2)
max_row <- final_table_z[1, ]
max_row[z_columns] <- 2  
max_row$Player <- "Max"

# Create Min row (set all Z-score columns to -2)
min_row <- final_table_z[1, ]
min_row[z_columns] <- -2
min_row$Player <- "Min"

# Combine the Max and Min rows to the final table
finaltable_z <- bind_rows(max_row, min_row, final_table_z)

# Print the final merged table
print(finaltable_z)

## # A tibble: 5 × 7
##   Player          PrgP_Z  Cmp_Z   GCA_Z  SCA_Z Tkl_1_Z   Int_Z
##   <chr>            <dbl>  <dbl>   <dbl>  <dbl>   <dbl>   <dbl>
## 1 Max              2      2      2       2       2      2     
## 2 Min             -2     -2     -2      -2      -2     -2     
## 3 Luka Modric      1.49   0.900  0.0984  1.24   -0.275  0.119 
## 4 Kevin De Bruyne  0.964  0.422  1.45    1.33   -0.139 -0.0389
## 5 N'Golo Kante     0.103  0.498 -0.519  -0.700   0.201  1.85

1.2 Radar Plot

Figure 4 below highlights the radar chart to objectively compare Luka Modric, Kevin De Bruyne, and N’Golo Kanté’s statistics using the standardized Z-scores for all 208 midfielders that played in the 2018 World Cup.

Luka Modric stands out for his strong performance in Progressive Passes (PrgP_Z) with a Z-score of 1.49, indicating a high ability to move the ball forward efficiently. His Completed Passes (Cmp_Z) also rank positively at 0.900, reflecting accuracy in distribution. Modric shows notable creative involvement with a Shot Creating Actions (SCA_Z) score of 1.24, although his defensive metrics are lower, with Tackles (Tkl_1_Z) at -0.275 and Interceptions (Int_Z) at 0.119.

Kevin De Bruyne’s profile highlights his attacking prowess, with Shot Creating Actions (SCA_Z) at 1.33 and Goal Creating Actions (GCA_Z) at 1.45, emphasizing his playmaking role. His Progressive Passes (PrgP_Z) score of 0.964 and Completed Passes (Cmp_Z) of 0.422 are moderate but effective, while his defensive metrics remain minimal, with Tackles (Tkl_1_Z) at -0.139 and Interceptions (Int_Z) at -0.0389.

N’Golo Kanté’s radar chart shows a defensive focus, as evidenced by his Interceptions (Int_Z) score of 1.85, significantly higher than both Modric and De Bruyne. His tackling ability is also slightly positive with a Tkl_1_Z of 0.201, but his offensive contributions are minimal, reflected by Shot Creating Actions (SCA_Z) at -0.700 and Goal Creating Actions (GCA_Z) at -0.519. His Progressive Passes (PrgP_Z) score of 0.103 and Completed Passes (Cmp_Z) of 0.498 indicate limited forward distribution compared to the other two.

# Load the dataset
data <- read.csv("CombinedData_Z_final.csv", row.names = 1)

# Extract player-specific data along with Max and Min for normalization
modric_data <- data[c("Max", "Min", "Luka Modric"), ]
kdb_data <- data[c("Max", "Min", "Kevin De Bruyne"), ]
Kanté_data <- data[c("Max", "Min", "N'Golo Kante"), ]

# Define colors and titles using colors from country flag
colors <- c("#171796", "#FFCD00", "#E1000F")
titles <- c("Luka Modric", "Kevin De Bruyne", "N'Golo Kanté")

# Format plot data into 3 columns
par(mar = c(1, 1, 1, 1))
par(oma = c(0, 0, 0, 0))  
par(mfrow = c(1, 3))

# Create radar chart function
create_beautiful_radarchart <- function(data, color = "#00AFBB", 
                                        vlabels = colnames(data), vlcex = 0.7,
                                        caxislabels = NULL, title = NULL, ...){
  radarchart(
    data, axistype = 1,
    # Customize the polygon
    pcol = color, pfcol = scales::alpha(color, 0.5), plwd = 2, plty = 1,
    # Customize the grid
    cglcol = "grey", cglty = 1, cglwd = 0.8,
    # Customize the axis
    axislabcol = "black", 
    # Variable labels
    vlcex = vlcex, vlabels = vlabels,
    caxislabels = caxislabels, title = title, ...
  )
}

# Create radar charts with modric, kdb, and Kanté data
create_beautiful_radarchart(modric_data, 
                            caxislabels = c(-2, -1, 0, 1, 2), 
                            color = colors[1], 
                            title = titles[1])

create_beautiful_radarchart(kdb_data, 
                            caxislabels = c(-2, -1, 0, 1, 2), 
                            color = colors[2], 
                            title = titles[2])

create_beautiful_radarchart(Kanté_data, 
                            caxislabels = c(-2, -1, 0, 1, 2), 
                            color = colors[3], 
                            title = titles[3])

# Reset plot settings
par(mfrow = c(1, 1))

Figure 4. Radar plot highlighting Z scores for Luka Modric, Kevin De Bruyne, and N’Golo Kanté across six statistical metrics (Completed Passes (Cmp), Progressive Passes (PrgP), Shot-Creating Actions (SCA), Goal-Creating Actions (GCA), Tackles (Tkl), and Interceptions (Int)).

Code Description Table:

# Define table and add descriptions for all stats
notation_table <- data.frame(
  Statistic = c("GCA_Z", "SCA_Z", "Tkl_1_Z", "Int_Z", "Cmp_Z", "PrgP_Z"),
  Z_Score_Description = c("Goal Creating Actions", 
                  "Shot Creating Actions", 
                  "Tackles", 
                  "Interceptions", 
                  "Completed Passes", 
                  "Progressive Passes")
)

# Remove "Player" row for better formatting in final table
descriptions <- notation_table[-1, ]
descr <- tableGrob(print(descriptions))

##   Statistic   Z_Score_Description
## 2     SCA_Z Shot Creating Actions
## 3   Tkl_1_Z               Tackles
## 4     Int_Z         Interceptions
## 5     Cmp_Z      Completed Passes
## 6    PrgP_Z    Progressive Passes

Part 2: Passing

A great midfielder in soccer is often judged by their passing ability. The best ones make successful passes consistently, meaning they rarely lose the ball. They also make progressive passes, which move the ball forward toward the opponent’s goal, helping create attacks. Lastly, they have positional awareness, knowing when and where to pass based on the movement of teammates and opponents. A midfielder who excels in these areas keeps their team in control and creates goal-scoring opportunities.

2.1 Completed Passes Network

Completed passes are important because they help a team keep control of the ball and build attacks. When passes are accurate, the team can move forward, create scoring chances, and keep the game under control. Bad passes give the ball to the other team, making it harder to win.

Let’s compare each of the three player’s successful passes in the World Cup.

The passing networks of Luka Modric, Kevin De Bruyne, and N’Golo Kanté showcase the distinct playing styles and positional responsibilities of each player. As shown in Figure 5, Luka Modric’s passing network is characterized by a high volume of successful passes, averaging 108 passes per 90 minutes. His network is well-distributed across the pitch, with a noticeable concentration in central areas and advanced positions. The clusters of his passing patterns indicate a balanced distribution between short, controlled passes and longer diagonal balls.

As shown in Figure 6, Kevin De Bruyne’s passing network, on the other hand, features a slightly lower average of 77 passes per 90 minutes. His passes are notably more direct and vertically oriented, often aimed at breaking defensive lines and creating goal-scoring opportunities. The clustering of his passes around the attacking third highlights his inclination to operate in advanced midfield positions, frequently threading through balls and executing line-breaking passes.

As shown in Figure 7, N’Golo Kanté’s passing network, averaging 89 passes per 90 minutes, are predominantly short and efficient, emphasizing ball retention. The clustering of passes near his defensive and central midfield positions demonstrates his role as a defensive midfielder.

# Define pitch dimensions
pitch_wyscout <- list(
  length = 100,
  width = 100,
  penalty_box_length = 16,
  penalty_box_width = 62,
  six_yard_box_length = 6,
  six_yard_box_width = 26,
  penalty_spot_distance = 10,
  goal_width = 12,
  origin_x = 0,
  origin_y = 0
)

# Create pitch using function and ggplot rectangles
create_wyscout_pitch <- function() {
  ggplot() +
    # Full pitch outline
    geom_rect(aes(xmin = pitch_wyscout$origin_x, xmax = pitch_wyscout$length,
                  ymin = pitch_wyscout$origin_y, ymax = pitch_wyscout$width), 
              color = "grey35", fill = NA) +
    
    # Penalty box (left)
    geom_rect(aes(xmin = pitch_wyscout$origin_x, 
                  xmax = pitch_wyscout$penalty_box_length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$penalty_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$penalty_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Penalty box (right)
    geom_rect(aes(xmin = pitch_wyscout$length - pitch_wyscout$penalty_box_length,
                  xmax = pitch_wyscout$length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$penalty_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$penalty_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Six-yard box (left)
    geom_rect(aes(xmin = pitch_wyscout$origin_x, 
                  xmax = pitch_wyscout$six_yard_box_length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$six_yard_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$six_yard_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Six-yard box (right)
    geom_rect(aes(xmin = pitch_wyscout$length - pitch_wyscout$six_yard_box_length,
                  xmax = pitch_wyscout$length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$six_yard_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$six_yard_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Center circle
    geom_point(aes(x = pitch_wyscout$length / 2, y = pitch_wyscout$width / 2), 
               size = 0.5, color = "grey35") +
    geom_circle(aes(x0 = pitch_wyscout$length / 2, y0 = pitch_wyscout$width / 2, r = 9.15),
                color = "grey35") +
    
    # Goals
    geom_segment(aes(x = pitch_wyscout$origin_x, 
                     xend = pitch_wyscout$origin_x, 
                     y = (pitch_wyscout$width - pitch_wyscout$goal_width) / 2, 
                     yend = (pitch_wyscout$width + pitch_wyscout$goal_width) / 2),
                 color = "grey35", size = 3) +
    geom_segment(aes(x = pitch_wyscout$length, 
                     xend = pitch_wyscout$length, 
                     y = (pitch_wyscout$width - pitch_wyscout$goal_width) / 2, 
                     yend = (pitch_wyscout$width + pitch_wyscout$goal_width) / 2),
                 color = "grey35", size = 3) +
    
    # Center line
    geom_segment(aes(x = pitch_wyscout$length / 2, xend = pitch_wyscout$length / 2,
                     y = pitch_wyscout$origin_y, yend = pitch_wyscout$width),
                 color = "grey35") +
    
     # Attack direction annotation
    annotate("segment", x = 45, xend = 80, y = 110, yend = 110, 
           arrow = arrow(length = unit(0.3, "inches")), color = "black", size = 1) +
    annotate("text", x = 60, y = 120, label = "Attacking Direction", size = 5, fontface = "bold") +
    
    # Adjust theme
    theme(
      panel.grid = element_blank(),
      panel.background = element_rect(fill = "white", color = NA),
      plot.background = element_rect(fill = "white", color = NA),
      axis.text = element_blank(),
      axis.ticks = element_blank(),
      axis.title = element_blank()
    ) +
    coord_fixed() +
    labs(title = "Football Pitch")
}

# Load dataset
croatia_data <- read.csv("CroatiaTeamData.csv")

# Filter Luka Modric's successful passes
modric_data <- croatia_data %>%
  filter(player_id == 8287 & event_name == "Pass" & grepl("1801", tags)) %>%
  select(start_x, start_y, end_x, end_y)

# Calculate total passes from player
total_passes_modric <- nrow(modric_data)

# Get total playtime (sum of max event time in each match for player_id 8287)
total_playtime_seconds_modric <- croatia_data %>%
  filter(player_id == 8287) %>%
  group_by(match_id) %>%
  summarise(max_event_sec = max(event_sec, na.rm = TRUE)) %>%
  summarise(total_playtime = sum(max_event_sec, na.rm = TRUE)) %>%
  pull(total_playtime)

# Convert playtime to minutes
total_playtime_minutes_modric <- total_playtime_seconds_modric / 60

# Calculate passes per 90 minutes
passes_per_90_modric <- round((total_passes_modric / total_playtime_minutes_modric) * 90)

# Perform k-means clustering for passing zones
set.seed(42)
num_clusters <- 8  
pass_clusters_modric <- kmeans(modric_data[, c("start_x", "start_y")], centers = num_clusters)
modric_data$start_cluster <- pass_clusters_modric$cluster

end_clusters_modric <- kmeans(modric_data[, c("end_x", "end_y")], centers = num_clusters)
modric_data$end_cluster <- end_clusters_modric$cluster

# Aggregate pass flows
flow_data_modric <- modric_data %>%
  group_by(start_cluster, end_cluster) %>%
  summarise(pass_count = n(),
            start_x = mean(start_x, na.rm = TRUE),
            start_y = mean(start_y, na.rm = TRUE),
            end_x = mean(end_x, na.rm = TRUE),
            end_y = mean(end_y, na.rm = TRUE)) %>%
  ungroup()

# Create a graph object
pass_network_modric <- graph_from_data_frame(flow_data_modric %>% select(start_cluster, end_cluster, pass_count), directed = TRUE)


# Read Belgium dataset
belgium_data <- read.csv("BelgiumTeamData.csv")

# Filter Kevin De Bruyne's successful passes
kdb_data <- belgium_data %>%
  filter(player_id == 38021 & event_name == "Pass" & grepl("1801", tags)) %>%
  filter(!is.na(start_x) & !is.na(start_y) & !is.na(end_x) & !is.na(end_y)) %>%
  select(start_x, start_y, end_x, end_y)

# Calculate total passes from player
total_passes_kdb <- nrow(kdb_data)

# Get total playtime (sum of max event time in each match for player_id 38021)
total_playtime_seconds_kdb <- belgium_data %>%
  filter(player_id == 38021) %>%
  group_by(match_id) %>%
  summarise(max_event_sec = max(event_sec, na.rm = TRUE)) %>%
  summarise(total_playtime = sum(max_event_sec, na.rm = TRUE)) %>%
  pull(total_playtime)

# Convert playtime to minutes
total_playtime_minutes_kdb <- total_playtime_seconds_kdb / 60

# Calculate passes per 90 minutes
passes_per_90_kdb <- round((total_passes_kdb / total_playtime_minutes_kdb) * 90)

# Perform k-means clustering for passing zones
set.seed(42)
pass_clusters_kdb <- kmeans(kdb_data[, c("start_x", "start_y")], centers = num_clusters)
kdb_data$start_cluster <- pass_clusters_kdb$cluster

end_clusters_kdb <- kmeans(kdb_data[, c("end_x", "end_y")], centers = num_clusters)
kdb_data$end_cluster <- end_clusters_kdb$cluster

# Aggregate pass flows
flow_data_kdb <- kdb_data %>%
  group_by(start_cluster, end_cluster) %>%
  summarise(pass_count = n(),
            start_x = mean(start_x, na.rm = TRUE),
            start_y = mean(start_y, na.rm = TRUE),
            end_x = mean(end_x, na.rm = TRUE),
            end_y = mean(end_y, na.rm = TRUE)) %>%
  ungroup()

# Create a graph object
pass_network_kdb <- graph_from_data_frame(flow_data_kdb %>% select(start_cluster, end_cluster, pass_count), directed = TRUE)

# Read France dataset
france_data <- read.csv("FranceTeamData.csv")

# Filter N'Golo Kanté's successful passes
Kanté_data <- france_data %>%
  filter(player_id == 31528 & event_name == "Pass" & grepl("1801", tags)) %>%
  filter(!is.na(start_x) & !is.na(start_y) & !is.na(end_x) & !is.na(end_y)) %>%
  select(start_x, start_y, end_x, end_y)

# Calculate total passes from player
total_passes_Kanté <- nrow(Kanté_data)

# Get total playtime (sum of max event time in each match for player_id 31528)
total_playtime_seconds_Kanté <- france_data %>%
  filter(player_id == 31528) %>%
  group_by(match_id) %>%
  summarise(max_event_sec = max(event_sec, na.rm = TRUE)) %>%
  summarise(total_playtime = sum(max_event_sec, na.rm = TRUE)) %>%
  pull(total_playtime)

# Convert playtime to minutes
total_playtime_minutes_Kanté <- total_playtime_seconds_Kanté / 60

# Calculate passes per 90 minutes
passes_per_90_Kanté <- round((total_passes_Kanté / total_playtime_minutes_Kanté) * 90)

# Perform k-means clustering for passing zones
set.seed(42)
pass_clusters_Kanté <- kmeans(Kanté_data[, c("start_x", "start_y")], centers = num_clusters)
Kanté_data$start_cluster <- pass_clusters_Kanté$cluster

end_clusters_Kanté <- kmeans(Kanté_data[, c("end_x", "end_y")], centers = num_clusters)
Kanté_data$end_cluster <- end_clusters_Kanté$cluster

# Aggregate pass flows
flow_data_Kanté <- Kanté_data %>%
  group_by(start_cluster, end_cluster) %>%
  summarise(pass_count = n(),
            start_x = mean(start_x, na.rm = TRUE),
            start_y = mean(start_y, na.rm = TRUE),
            end_x = mean(end_x, na.rm = TRUE),
            end_y = mean(end_y, na.rm = TRUE)) %>%
  ungroup()

# Create a graph object
pass_network_Kanté <- graph_from_data_frame(flow_data_Kanté %>% select(start_cluster, end_cluster, pass_count), directed = TRUE)

# Generate the subtitle with the record count
subtitle_text_modric <- paste("Total successful passes normalized per 90 minutes:", passes_per_90_modric)

# Generate the subtitle with the record count
subtitle_text_kdb <- paste("Total successful passes normalized per 90 minutes:", passes_per_90_kdb)

# Generate the subtitle with the record count
subtitle_text_Kanté <- paste("Total successful passes normalized per 90 minutes:", passes_per_90_Kanté)

# Set up a 3-row, 1-column layout
par(mfrow = c(3, 1))

# Generate Luka Modric's passing network
plot(create_wyscout_pitch() +
  # Draw pass connections
  geom_curve(data = flow_data_modric, 
             aes(x = start_x, y = start_y, xend = end_x, yend = end_y, size = pass_count * 0.07),
             curvature = 0.2, color = "gray70", alpha = 0.6) +
  
  # Draw pass origin and destination clusters
  geom_point(data = flow_data_modric, 
             aes(x = start_x, y = start_y, size = pass_count * 0.4, color = "Pass Origin"), show.legend = TRUE) +
  geom_point(data = flow_data_modric, 
             aes(x = end_x, y = end_y, size = pass_count * 0.4, color = "Pass Destination"), show.legend = TRUE) +

  
  # Adjustments
  theme_void() +
  labs(title = "Luka Modric's Passing Network", 
       subtitle = subtitle_text_modric) +
  
  # Adjust legend for better clarity
  scale_color_manual(name = "Cluster Type", values = c("Pass Origin" = "lightblue", "Pass Destination" = "blue")) +
  scale_size_continuous(name = "Pass Frequency") +
  guides(size = guide_legend(title = "Pass Origin & Destination Clusters")) +
  
  coord_fixed(ratio = 60 / 100)
)

# Generate Kevin De Bruyne's passing network
plot(create_wyscout_pitch() +
  # Draw pass connections
  geom_curve(data = flow_data_kdb, 
             aes(x = start_x, y = start_y, xend = end_x, yend = end_y, size = pass_count * 0.07),
             curvature = 0.2, color = "gray70", alpha = 0.6) +
  
  # Draw pass origin and destination clusters
  geom_point(data = flow_data_kdb, 
             aes(x = start_x, y = start_y, size = pass_count * 0.4, color = "Pass Origin"), show.legend = TRUE) +
  geom_point(data = flow_data_kdb, 
             aes(x = end_x, y = end_y, size = pass_count * 0.4, color = "Pass Destination"), show.legend = TRUE) +

  
  # Adjustments
  theme_void() +
  labs(title = "Kevin De Bruyne's Passing Network", 
       subtitle = subtitle_text_kdb) +
  
  # Adjust legend for better clarity
  scale_color_manual(name = "Cluster Type", values = c("Pass Origin" = "lightyellow", "Pass Destination" = "#FFCD00")) +
  scale_size_continuous(name = "Pass Frequency") +
  guides(size = guide_legend(title = "Pass Origin & Destination Clusters")) +
  
  coord_fixed(ratio = 60 / 100)
)

# Generate N’Golo Kanté's passing network
plot(create_wyscout_pitch() +
  # Draw pass connections
  geom_curve(data = flow_data_Kanté, 
             aes(x = start_x, y = start_y, xend = end_x, yend = end_y, size = pass_count * 0.07),
             curvature = 0.2, color = "gray70", alpha = 0.6) +
  
  # Draw pass origin and destination clusters
  geom_point(data = flow_data_Kanté, 
             aes(x = start_x, y = start_y, size = pass_count * 0.4, color = "Pass Origin"), show.legend = TRUE) +
  geom_point(data = flow_data_Kanté, 
             aes(x = end_x, y = end_y, size = pass_count * 0.4, color = "Pass Destination"), show.legend = TRUE) +

  
  # Adjustments
  theme_void() +
  labs(title = "N'Golo Kanté's Passing Network", 
       subtitle = subtitle_text_Kanté) +
  
  # Adjust legend for better clarity
  scale_color_manual(name = "Cluster Type", values = c("Pass Origin" = "#FF7F7F", "Pass Destination" = "#E1000F")) +
  scale_size_continuous(name = "Pass Frequency") +
  guides(size = guide_legend(title = "Pass Origin & Destination Clusters")) +
  
  coord_fixed(ratio = 60 / 100)
)

Figures 5, 6, 7. Passing networks for Luka Modric, Kevin De Bruyne, and N’Golo Kanté using k clustering techniques to delineate common start/end pass locations in relation to the general passing distribution.

2.2 Progressive Passing

Progressive Pass Explained

A progressive pass is a pass that moves the ball forward towards the oppositions goal (Wyscout Data Glossary). In the example below, Luka Modric passes the ball to his teammate in the opponent’s 18 yard box.

Luka Modric progressive pass example.

Let’s compare each of the three player’s progressive passes in the World Cup.

As we can see in the figures below, Luka Modric and Kevin De Bruyne both averaged 11 successful progressive passes per 90 minutes during the 2018 World Cup, showcasing their ability to consistently drive their teams forward with long, accurate passes.

As shown in Figure 8, Luka Modric’s progressive passing network is widely distributed across the attacking half of the pitch, covering long distances, often exceeding 30+ yards, and are strategically directed toward both flanks and the final third. The high density of passes into advanced areas highlights his ability to break the opposition’s defensive line and create attacking opportunities for his team.

As shown in Figure 9, Kevin De Bruyne’s progressive passing tendancies are highlighted as a more direct and vertically oriented network. His passing patterns are denser in the attacking third, indicating his advanced positioning and creative responsibilities aim to generate goal-scoring opportunities.

In contrast, N’Golo Kanté averaged just 3 progressive passes per game, as shown in Figure 10. This reflects his more defensive and transitional role within the French team. While Modric and De Bruyne’s passing abilities were central to their teams’ attacking play, Kanté’s limited progressive passing highlights his emphasis on shorter passing actions focused on maintaining possession rather than direct forward progression.

# Read the dataset
croatia_data <- read.csv("CroatiaTeamData.csv")

# Filter data to include player_id 8287 and successful passes (tagged as 1801) with a horizontal distance of over 28 yards or ending in the final third zone
croatia_data_filter <- croatia_data %>%
  filter(player_id == 8287 & event_name == "Pass" & grepl("1801", tags) & (abs(end_x - start_x) > 28 | (end_x > 75 & end_y > 20 & end_y < 80)))

# Calculate total passes for player
total_passes <- nrow(croatia_data_filter)

# Get total playtime (sum of max event time in each match for player_id 8287)
total_playtime_seconds <- croatia_data %>%
  filter(player_id == 8287) %>%
  group_by(match_id) %>%
  summarise(max_event_sec = max(event_sec, na.rm = TRUE)) %>%
  summarise(total_playtime = sum(max_event_sec, na.rm = TRUE)) %>%
  pull(total_playtime)

# Convert playtime to minutes
total_playtime_minutes <- total_playtime_seconds / 60

# Calculate passes per 90 minutes
passes_per_90 <- round((total_passes / total_playtime_minutes) * 90)

# Define pitch dimensions
pitch_wyscout <- list(
  length = 100,
  width = 100,
  penalty_box_length = 16,
  penalty_box_width = 62,
  six_yard_box_length = 6,
  six_yard_box_width = 26,
  penalty_spot_distance = 10,
  goal_width = 12,
  origin_x = 0,
  origin_y = 0
)

create_wyscout_pitch <- function() {
  ggplot() +
    # Full pitch outline
    geom_rect(aes(xmin = pitch_wyscout$origin_x, xmax = pitch_wyscout$length,
                  ymin = pitch_wyscout$origin_y, ymax = pitch_wyscout$width), 
              color = "grey35", fill = NA) +
    
    # Penalty box (left)
    geom_rect(aes(xmin = pitch_wyscout$origin_x, 
                  xmax = pitch_wyscout$penalty_box_length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$penalty_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$penalty_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Penalty box (right)
    geom_rect(aes(xmin = pitch_wyscout$length - pitch_wyscout$penalty_box_length,
                  xmax = pitch_wyscout$length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$penalty_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$penalty_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Six-yard box (left)
    geom_rect(aes(xmin = pitch_wyscout$origin_x, 
                  xmax = pitch_wyscout$six_yard_box_length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$six_yard_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$six_yard_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Six-yard box (right)
    geom_rect(aes(xmin = pitch_wyscout$length - pitch_wyscout$six_yard_box_length,
                  xmax = pitch_wyscout$length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$six_yard_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$six_yard_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Center circle
    geom_point(aes(x = pitch_wyscout$length / 2, y = pitch_wyscout$width / 2), 
               size = 0.5, color = "grey35") +
    geom_circle(aes(x0 = pitch_wyscout$length / 2, y0 = pitch_wyscout$width / 2, r = 9.15),
                color = "grey35") +
    
    # Goals
    geom_segment(aes(x = pitch_wyscout$origin_x, 
                     xend = pitch_wyscout$origin_x, 
                     y = (pitch_wyscout$width - pitch_wyscout$goal_width) / 2, 
                     yend = (pitch_wyscout$width + pitch_wyscout$goal_width) / 2),
                 color = "grey35", size = 3) +
    geom_segment(aes(x = pitch_wyscout$length, 
                     xend = pitch_wyscout$length, 
                     y = (pitch_wyscout$width - pitch_wyscout$goal_width) / 2, 
                     yend = (pitch_wyscout$width + pitch_wyscout$goal_width) / 2),
                 color = "grey35", size = 3) +
    
    # Center line
    geom_segment(aes(x = pitch_wyscout$length / 2, xend = pitch_wyscout$length / 2,
                     y = pitch_wyscout$origin_y, yend = pitch_wyscout$width),
                 color = "grey35") +
    
     # Attack direction annotation
    annotate("segment", x = 45, xend = 80, y = 110, yend = 110, 
           arrow = arrow(length = unit(0.3, "inches")), color = "black", size = 1) +
    annotate("text", x = 60, y = 120, label = "Attacking Direction", size = 5, fontface = "bold") +
    
    # Adjust theme
    theme(
      panel.grid = element_blank(),
      panel.background = element_rect(fill = "white", color = NA),
      plot.background = element_rect(fill = "white", color = NA),
      axis.text = element_blank(),
      axis.ticks = element_blank(),
      axis.title = element_blank()
    ) +
    coord_fixed() +
    labs(title = "Football Pitch")
}

# Generate the subtitle with the record count
subtitle_text <- paste("Total successful progressive passes normalized per 90 minutes:", passes_per_90)

# Plot the passes on the pitch
create_wyscout_pitch() +
  geom_segment(data = croatia_data_filter, 
               aes(x = start_x, y = start_y, 
                   xend = end_x, yend = end_y), 
               lineend = "round", 
               linewidth = 0.6,
               arrow = arrow(length = unit(0.08, "inches")),
               color = "#171796",
               alpha = 0.5) +
  labs(title = "Luka Modric Successful Progressive Passes in 2018 World Cup",
       subtitle = subtitle_text) + 
  coord_fixed(ratio = 60 / 100)

Figure 8. Progressive passing actions performed by Luka Modric in 2018 World Cup.

# Read the dataset
belgium_data <- read.csv("BelgiumTeamData.csv")

# Filter data to include player_id 38021 and successful passes (tagged as 1801) with a horizontal distance of over 28 yards or ending in the final third zone
belgium_data_filter <- belgium_data %>%
  filter(player_id == 38021 & event_name == "Pass" & grepl("1801", tags) & end_x > start_x & (abs(end_x - start_x) > 28 | (end_x > 75 & end_y > 20 & end_y < 80)))

# Calculate total passes for the player
total_passes <- nrow(belgium_data_filter)

# Get total playtime (sum of max event time in each match for player_id 70123)
total_playtime_seconds <- belgium_data %>%
  filter(player_id == 38021) %>%
  group_by(match_id) %>%
  summarise(max_event_sec = max(event_sec, na.rm = TRUE)) %>%
  summarise(total_playtime = sum(max_event_sec, na.rm = TRUE)) %>%
  pull(total_playtime)

# Convert playtime to minutes
total_playtime_minutes <- total_playtime_seconds / 60

# Calculate passes per 90 minutes
passes_per_90 <- round((total_passes / total_playtime_minutes) * 90)

# Define pitch dimensions
pitch_wyscout <- list(
  length = 100,
  width = 100,
  penalty_box_length = 16,
  penalty_box_width = 62,
  six_yard_box_length = 6,
  six_yard_box_width = 26,
  penalty_spot_distance = 10,
  goal_width = 12,
  origin_x = 0,
  origin_y = 0
)

create_wyscout_pitch <- function() {
  ggplot() +
    # Full pitch outline
    geom_rect(aes(xmin = pitch_wyscout$origin_x, xmax = pitch_wyscout$length,
                  ymin = pitch_wyscout$origin_y, ymax = pitch_wyscout$width), 
              color = "grey35", fill = NA) +
    
    # Penalty box (left)
    geom_rect(aes(xmin = pitch_wyscout$origin_x, 
                  xmax = pitch_wyscout$penalty_box_length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$penalty_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$penalty_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Penalty box (right)
    geom_rect(aes(xmin = pitch_wyscout$length - pitch_wyscout$penalty_box_length,
                  xmax = pitch_wyscout$length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$penalty_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$penalty_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Six-yard box (left)
    geom_rect(aes(xmin = pitch_wyscout$origin_x, 
                  xmax = pitch_wyscout$six_yard_box_length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$six_yard_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$six_yard_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Six-yard box (right)
    geom_rect(aes(xmin = pitch_wyscout$length - pitch_wyscout$six_yard_box_length,
                  xmax = pitch_wyscout$length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$six_yard_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$six_yard_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Center circle
    geom_point(aes(x = pitch_wyscout$length / 2, y = pitch_wyscout$width / 2), 
               size = 0.5, color = "grey35") +
    geom_circle(aes(x0 = pitch_wyscout$length / 2, y0 = pitch_wyscout$width / 2, r = 9.15),
                color = "grey35") +
    
    # Goals
    geom_segment(aes(x = pitch_wyscout$origin_x, 
                     xend = pitch_wyscout$origin_x, 
                     y = (pitch_wyscout$width - pitch_wyscout$goal_width) / 2, 
                     yend = (pitch_wyscout$width + pitch_wyscout$goal_width) / 2),
                 color = "grey35", size = 3) +
    geom_segment(aes(x = pitch_wyscout$length, 
                     xend = pitch_wyscout$length, 
                     y = (pitch_wyscout$width - pitch_wyscout$goal_width) / 2, 
                     yend = (pitch_wyscout$width + pitch_wyscout$goal_width) / 2),
                 color = "grey35", size = 3) +
    
    # Center line
    geom_segment(aes(x = pitch_wyscout$length / 2, xend = pitch_wyscout$length / 2,
                     y = pitch_wyscout$origin_y, yend = pitch_wyscout$width),
                 color = "grey35") +
    
    # Attack direction annotation
    annotate("segment", x = 45, xend = 80, y = 110, yend = 110, 
           arrow = arrow(length = unit(0.3, "inches")), color = "black", size = 1) +
    annotate("text", x = 60, y = 120, label = "Attacking Direction", size = 5, fontface = "bold") +
    
    
    # Adjust theme
    theme(
      panel.grid = element_blank(),
      panel.background = element_rect(fill = "white", color = NA),
      plot.background = element_rect(fill = "white", color = NA),
      axis.text = element_blank(),
      axis.ticks = element_blank(),
      axis.title = element_blank()
    ) +
    coord_fixed() +
    labs(title = "Football Pitch")
}

# Generate the subtitle with the record count
subtitle_text <- paste("Total successful progressive passes normalized per 90 minutes:", passes_per_90)

# Plot the passes on the pitch
create_wyscout_pitch() +
  geom_segment(data = belgium_data_filter, 
               aes(x = start_x, y = start_y, 
                   xend = end_x, yend = end_y), 
               lineend = "round", 
               linewidth = 0.6,
               arrow = arrow(length = unit(0.08, "inches")),
               color = "#FFCD00",
               alpha = 0.5) +
  labs(title = "Kevin De Bruyne Successful Progressive Passes in 2018 World Cup",
       subtitle = subtitle_text) + 
  coord_fixed(ratio = 60 / 100)

Figure 9. Progressive passing actions performed by Kevin De Bruyne in 2018 World Cup.

# Read the dataset
france_data <- read.csv("FranceTeamData.csv")

# Filter data to include player_id 31528 and successful passes (tagged as 1801) with a horizontal distance of over 28 yards or ending in the final third zone
Kanté_pass <- france_data %>%
  filter(player_id == 31528 & event_name == "Pass" & grepl("1801", tags) & end_x > start_x & (abs(end_x - start_x) > 28 | (end_x > 75 & end_y > 20 & end_y < 80)))

# Calculate total passes for the player
total_passes <- nrow(Kanté_pass)

# Get total playtime (sum of max event time in each match for player_id 31528)
total_playtime_seconds <- france_data %>%
  filter(player_id == 31528) %>%
  group_by(match_id) %>%
  summarise(max_event_sec = max(event_sec, na.rm = TRUE)) %>%
  summarise(total_playtime = sum(max_event_sec, na.rm = TRUE)) %>%
  pull(total_playtime)

# Convert playtime to minutes
total_playtime_minutes <- total_playtime_seconds / 60

# Calculate passes per 90 minutes
passes_per_90 <- round((total_passes / total_playtime_minutes) * 90)

# Define pitch dimensions
pitch_wyscout <- list(
  length = 100,
  width = 100,
  penalty_box_length = 16,
  penalty_box_width = 62,
  six_yard_box_length = 6,
  six_yard_box_width = 26,
  penalty_spot_distance = 10,
  goal_width = 12,
  origin_x = 0,
  origin_y = 0
)

create_wyscout_pitch <- function() {
  ggplot() +
    # Full pitch outline
    geom_rect(aes(xmin = pitch_wyscout$origin_x, xmax = pitch_wyscout$length,
                  ymin = pitch_wyscout$origin_y, ymax = pitch_wyscout$width), 
              color = "grey35", fill = NA) +
    
    # Penalty box (left)
    geom_rect(aes(xmin = pitch_wyscout$origin_x, 
                  xmax = pitch_wyscout$penalty_box_length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$penalty_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$penalty_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Penalty box (right)
    geom_rect(aes(xmin = pitch_wyscout$length - pitch_wyscout$penalty_box_length,
                  xmax = pitch_wyscout$length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$penalty_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$penalty_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Six-yard box (left)
    geom_rect(aes(xmin = pitch_wyscout$origin_x, 
                  xmax = pitch_wyscout$six_yard_box_length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$six_yard_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$six_yard_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Six-yard box (right)
    geom_rect(aes(xmin = pitch_wyscout$length - pitch_wyscout$six_yard_box_length,
                  xmax = pitch_wyscout$length,
                  ymin = (pitch_wyscout$width - pitch_wyscout$six_yard_box_width) / 2,
                  ymax = (pitch_wyscout$width + pitch_wyscout$six_yard_box_width) / 2),
              color = "grey35", fill = NA) +
    
    # Center circle
    geom_point(aes(x = pitch_wyscout$length / 2, y = pitch_wyscout$width / 2), 
               size = 0.5, color = "grey35") +
    geom_circle(aes(x0 = pitch_wyscout$length / 2, y0 = pitch_wyscout$width / 2, r = 9.15),
                color = "grey35") +
    
    # Goals
    geom_segment(aes(x = pitch_wyscout$origin_x, 
                     xend = pitch_wyscout$origin_x, 
                     y = (pitch_wyscout$width - pitch_wyscout$goal_width) / 2, 
                     yend = (pitch_wyscout$width + pitch_wyscout$goal_width) / 2),
                 color = "grey35", size = 3) +
    geom_segment(aes(x = pitch_wyscout$length, 
                     xend = pitch_wyscout$length, 
                     y = (pitch_wyscout$width - pitch_wyscout$goal_width) / 2, 
                     yend = (pitch_wyscout$width + pitch_wyscout$goal_width) / 2),
                 color = "grey35", size = 3) +
    
    # Center line
    geom_segment(aes(x = pitch_wyscout$length / 2, xend = pitch_wyscout$length / 2,
                     y = pitch_wyscout$origin_y, yend = pitch_wyscout$width),
                 color = "grey35") +
    
    # Attack direction annotation
    annotate("segment", x = 45, xend = 80, y = 110, yend = 110, 
           arrow = arrow(length = unit(0.3, "inches")), color = "black", size = 1) +
    annotate("text", x = 60, y = 120, label = "Attacking Direction", size = 5, fontface = "bold") +
    
    
    # Adjust theme
    theme(
      panel.grid = element_blank(),
      panel.background = element_rect(fill = "white", color = NA),
      plot.background = element_rect(fill = "white", color = NA),
      axis.text = element_blank(),
      axis.ticks = element_blank(),
      axis.title = element_blank()
    ) +
    coord_fixed() +
    labs(title = "Football Pitch")
}

# Generate the subtitle with the record count
subtitle_text <- paste("Total successful progressive passes normalized per 90 minutes:", passes_per_90)

# Plot the passes on the pitch
create_wyscout_pitch() +
  geom_segment(data = Kanté_pass, 
               aes(x = start_x, y = start_y, 
                   xend = end_x, yend = end_y), 
               lineend = "round", 
               linewidth = 0.6,
               arrow = arrow(length = unit(0.08, "inches")),
               color = "#E1000F",
               alpha = 0.5) + 
  labs(title = "N'Golo Kanté Successful Progressive Passes in 2018 World Cup",
       subtitle = subtitle_text) + 
  coord_fixed(ratio = 60 / 100)

Figure 10. Progressive passing actions performed by N’Golo Kanté in 2018 World Cup.

2.3 Passing in relation to location on the pitch using spatial autocorrelation

As mentioned in the methodology, spatial autocorrelation can be used to analyze the clustering of spatial data. In the case of passing in a soccer match, spatial autocorrelation can be used to see if passing locations are clustered together or spread evenly across a pitch.

Midfielders often act as the epicenter of a team’s passing network, as they receive and distribute passes from the defense to the offense. A high Moran’s I value would confirm the notion that they are the central hubs of passing networks, receiving and distributing balls within a tightly knit spatial cluster. Conversely, a lower or negative Moran’s I value could indicate a more fluid and unpredictable passing pattern, potentially reflecting a more dynamic style of play or an emphasis on rapid transitions. By calculating Moran’s I for each player and comparing the results, we can gain insights into their passing tendencies and their roles within the team’s tactical setup.

Performing Moran’s I test on Modric Passing Locations

As shown in Table 1, when reviewing the results for the Moran’s I test on Luka Modric’s starting pass locations, we can see that the p score is less than 0.05.

Since p < 0.05, we reject the null hypothesis (which assumes passes are randomly distributed). This means that certain areas of the field see more passing activity.

# Read the dataset
croatia_data <- read.csv("CroatiaTeamData.csv")

# Filter data to only player_id 8287 (Luka Modric)
croatia_data_filter <- croatia_data %>%
  filter(player_id == 8287 & event_name == "Pass")

# Count number of passes at each location
croatia_data_aggregated <- croatia_data_filter %>%
  group_by(start_x, start_y) %>%
  summarise(pass_count = n())

# Convert to sf object
croatia_sf_pass <- st_as_sf(croatia_data_aggregated, coords = c("start_x", "start_y"), crs = 4326)

# Create neighbors
nb <- dnearneigh(st_coordinates(croatia_sf_pass), 0, 30)  
lw <- nb2listw(nb, style = "W", zero.policy = TRUE)

# Perform Moran's I test on pass count
moran_test_count <- moran.test(croatia_data_aggregated$pass_count, lw, zero.policy = TRUE)

# Print results 
print(moran_test_count)

## 
##  Moran I test under randomisation
## 
## data:  croatia_data_aggregated$pass_count  
## weights: lw    
## 
## Moran I statistic standard deviate = 2.5981, p-value = 0.004687
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      9.465372e-03     -2.309469e-03      2.053928e-05

Table 1. Moran’s I test results on Luka Modric’s starting pass locations in the 2018 World Cup.

Performing Moran’s I test on Kevin De Bruyne Passing Locations

As shown in Table 2, when reviewing the results for the Moran’s I test found below on Kevin De Bruyne’s starting pass locations, we can see that the p score is greater than 0.05.

Since p > 0.05, we fail to reject the null hypothesis (which assumes passes are not randomly distributed). This means that there is no significant spatial autocorrelation.

# Read the dataset
belgium_data <- read.csv("BelgiumTeamData.csv")

# Filter data to only player_id 38021 (Kevin De Bruyne)
belgium_data_filter <- belgium_data %>%
  filter(player_id == 38021 & event_name == "Pass")

# Count number of passes at each location
belgium_data_aggregated <- belgium_data_filter %>%
  group_by(start_x, start_y) %>%
  summarise(pass_count = n())

# Convert to sf object
belgium_sf_pass <- st_as_sf(belgium_data_aggregated, coords = c("start_x", "start_y"))

# Create neighbors
nb <- dnearneigh(st_coordinates(belgium_sf_pass), 0, 30)  
lw <- nb2listw(nb, style = "W", zero.policy = TRUE)

# Perform Moran's I test on pass count
beligum_moran_test_count <- moran.test(belgium_data_aggregated$pass_count, lw, zero.policy = TRUE)

# Print results 
print(beligum_moran_test_count)

## 
##  Moran I test under randomisation
## 
## data:  belgium_data_aggregated$pass_count  
## weights: lw    
## 
## Moran I statistic standard deviate = 1.383, p-value = 0.08334
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      5.129646e-03     -3.367003e-03      3.774605e-05

Table 2. Moran’s I test results on Kevin De Bruyne’s starting pass locations in the 2018 World Cup.

Performing Moran’s I test on N’Golo Kanté Passing Locations

Moran’s I test can be applied to other players, such as N’Golo Kanté.

As shown in Table 3, when reviewing the results for the Moran’s I test found below on N’Golo Kanté’s starting pass locations, we can see that the p score is greater than 0.05.

Since p > 0.05, we fail to reject the null hypothesis (which assumes passes are not randomly distributed). This means that there is no significant spatial autocorrelation.

# Read the dataset
france_data <- read.csv("FranceTeamData.csv")

# Filter data to only player_id 31528 (N'Golo Kanté)
france_data_filter <- france_data %>%
  filter(player_id == 31528 & event_name == "Pass")

# Count number of passes at each location
france_data_aggregated <- france_data_filter %>%
  group_by(start_x, start_y) %>%
  summarise(pass_count = n())

# Convert to sf object
france_sf_pass <- st_as_sf(france_data_aggregated, coords = c("start_x", "start_y"))

# Create neighbors
nb <- dnearneigh(st_coordinates(france_sf_pass), 0, 30)  
lw <- nb2listw(nb, style = "W", zero.policy = TRUE)

# Perform Moran's I test on pass count
france_moran_test_count <- moran.test(france_data_aggregated$pass_count, lw, zero.policy = TRUE)

# Print results 
print(france_moran_test_count)

## 
##  Moran I test under randomisation
## 
## data:  france_data_aggregated$pass_count  
## weights: lw    
## 
## Moran I statistic standard deviate = 0.14243, p-value = 0.4434
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##     -2.112137e-03     -2.906977e-03      3.114345e-05

Table 3. Moran’s I test results on N’Golo Kanté’s starting pass locations in the 2018 World Cup.

Comparison Summary

Kevin De Bruyne shows some clustering, but it is not statistically significant, suggesting flexibility in his playmaking.

N’Golo Kanté’s passes are the most dispersed, reflecting his role as a defensive midfielder who distributes the ball widely rather than concentrating play in one area.

Conversely, Luka Modrić’s passing distribution is has the highest Moran’s I test, meaning he tends to pass within specific zones rather than spreading the ball randomly.

Let’s look at Modric’s passing distribution via a heat map to see where his passes are concentrated.

Continued Analysis: Luka Modrić’s Passing Heatmap

For this analysis, the soccer pitch is broken into a 10x10 grid. A standard soccer field is approximately 100-120 yards long and 50-75 yards wide (FIFA). By dividing the field into a 10x10 grid, each grid cell represents an area of roughly 10 yards by 10 yards. This scale is large enough to capture meaningful data about player positioning and passing locations while still providing sufficient granularity to analyze trends or areas of activity.

When comparing the heat map and average starting pass location in Figure 11 to the starting positional lineup in Figure 12, we can see that Modric tends to pass around the location he is assigned.

This highlights his exceptional positional awareness and ability to follow tactical plans throughout a soccer match.

# Define pitch dimensions
pitch_length <- 100
pitch_width <- 100
grid_size <- 10
penalty_box_length <- 16
penalty_box_width <- 62
six_yard_box_length <- 6
six_yard_box_width <- 26
goal_width <- 12

# Create grid bins
df <- croatia_data_filter %>%
  mutate(grid_x = floor(start_x / grid_size),
         grid_y = floor(start_y / grid_size))

# Aggregate counts per grid cell
heatmap_data <- df %>%
  group_by(grid_x, grid_y) %>%
  summarise(event_count = n()) %>%
  filter(event_count>7)

# Calculate the average start_x and start_y
average_start_x <- mean(croatia_data_filter$start_x, na.rm = TRUE)
average_start_y <- mean(croatia_data_filter$start_y, na.rm = TRUE)

# Create a dataframe for the average start position point
average_point <- data.frame(
  x = average_start_x,
  y = average_start_y,
  label = "Avg Start Position"
)

# Start with the heatmap in the background
heatmap_plot <- ggplot() +
  geom_tile(data = heatmap_data, aes(x = grid_x * grid_size + grid_size / 2, 
                                     y = grid_y * grid_size + grid_size / 2, 
                                     fill = event_count), color = "white") +
  scale_fill_gradient(low = "#ADD8E6", high = "#171796", labels = scales::label_number(accuracy = 1)) +
  labs(title = "Heatmap Luka Modric Starting Passes",
       subtitle = "Darker regions represent more passes performed per grid",
       fill = "Pass Count") +
  theme_classic() +
  theme(axis.text = element_blank(), 
        axis.ticks = element_blank(), 
        axis.title = element_blank(),
        panel.grid = element_blank()) +
  
  # Full pitch outline
  geom_rect(aes(xmin = 0, xmax = pitch_length, ymin = 0, ymax = pitch_width), 
            color = "grey", fill = NA) +
  
  # Grid lines (10x10)
  geom_segment(data = data.frame(x = seq(0, pitch_length, by = grid_size)),
               aes(x = x, xend = x, y = 0, yend = pitch_width),
               color = "lightgrey", linetype = "dotted") +
  geom_segment(data = data.frame(y = seq(0, pitch_width, by = grid_size)),
               aes(x = 0, xend = pitch_length, y = y, yend = y),
               color = "lightgrey", linetype = "dotted") +
  
  # Penalty box (left side)
  geom_rect(aes(xmin = 0, xmax = penalty_box_length,
                ymin = (pitch_width - penalty_box_width) / 2,
                ymax = (pitch_width + penalty_box_width) / 2),
            color = "grey35", fill = NA) +
  
  # Penalty box (right side)
  geom_rect(aes(xmin = pitch_length - penalty_box_length, xmax = pitch_length,
                ymin = (pitch_width - penalty_box_width) / 2,
                ymax = (pitch_width + penalty_box_width) / 2),
            color = "grey35", fill = NA) +
  
  # Six-yard box (left side)
  geom_rect(aes(xmin = 0, xmax = six_yard_box_length,
                ymin = (pitch_width - six_yard_box_width) / 2,
                ymax = (pitch_width + six_yard_box_width) / 2),
            color = "grey35", fill = NA) +
  
  # Six-yard box (right side)
  geom_rect(aes(xmin = pitch_length - six_yard_box_length, xmax = pitch_length,
                ymin = (pitch_width - six_yard_box_width) / 2,
                ymax = (pitch_width + six_yard_box_width) / 2),
            color = "grey35", fill = NA) +
  
  # Goals (left side)
  geom_segment(aes(x = 0, xend = 0, 
                   y = (pitch_width - goal_width) / 2, 
                   yend = (pitch_width + goal_width) / 2),
               color = "black", size = 3) +
  
  # Goals (right side)
  geom_segment(aes(x = pitch_length, xend = pitch_length, 
                   y = (pitch_width - goal_width) / 2, 
                   yend = (pitch_width + goal_width) / 2),
               color = "black", size = 3) +
  
  # Center circle
  geom_circle(aes(x0 = pitch_length / 2, y0 = pitch_width / 2, r = 9.15),
              color = "grey35", inherit.aes = FALSE) +
  
  # Center line
  geom_segment(aes(x = pitch_length / 2, xend = pitch_length / 2,
                   y = 0, yend = pitch_width),
               color = "grey35") +
  
  # Attack direction annotation
  annotate("segment", x = 45, xend = 80, y = 110, yend = 110, 
           arrow = arrow(length = unit(0.3, "inches")), color = "black", size = 1.2) +
  annotate("text", x = 60, y = 120, label = "Attacking Direction", size = 5, fontface = "bold") +
  
  # Add the average start_x and start_y as a black point
  geom_point(aes(x = average_start_x, y = average_start_y), color = "white",, fill = "black", size = 7, shape = 21, stroke =1.5) +
  
  # Add text annotation above the average point
  geom_text(data = average_point, aes(x = x, y = y + 6, label = "Average Pass Start Location"), 
            color = "black", size = 3, hjust = 0.5) +
  
  coord_fixed()

# Show the final plot
print(heatmap_plot)

Figure 11. Heatmap of Luka Modric’s starting pass locations in relation to the average start pass location across the seven games in the 2018 World Cup.

Figure 12. Luka Modric Starting Position in lineup vs Iceland in the 2018 World Cup group stage.

Part 3: Defending

A great midfielder isn’t just good at passing—they also help their team by stopping the opponent’s attacks. Two key defensive metrics for gauging defensive contributions are tackles and interceptions.

A midfielder with high numbers in these areas can break up attacks, regain possession, and keep their team in control of the game.

Players who excel at this are crucial for protecting their defense and starting counterattacks.

Tackle Explained

A tackle is a defensive action in soccer where the player stops the opposition from dribbling, passing, or shooting the ball (Wyscout Data Glossary).

In the example below, Luka Modric tackles the opposition as he stops the pass of his opponent.

Tackle/Interception example.

Let’s compare each of the three player’s defensive tackling contributions in the World Cup.

For this analysis, 18 equal zones were created following the similar method derived from a study conducted at John Moores University in Liverpool, UK (Smarter Soccer). This commonly used zoning method allows for recognizable and repeatable comparison to other similar analytical figures.

The analysis of defensive tackling contributions in the World Cup for N’Golo Kanté, Kevin De Bruyne, and Luka Modric highlights significant differences in their defensive efficiency and positioning. N’Golo Kanté recorded the highest number of tackles per 90 minutes, averaging 47. In contrast, Luka Modric completed 37 tackles per 90 minutes, while Kevin De Bruyne recorded the lowest figure among the three, averaging 32 tackles per 90 minutes. This indicates that Kanté had a significantly greater defensive output compared to both Modric and De Bruyne.

The spatial distribution of tackles further emphasizes the differences in their roles and tactical positioning. As shown in Figure 14, Kanté’s heatmap shows a high density of tackles spread across the central midfield area, particularly near the defensive third. This pattern aligns with his role as a defensive midfielder, where he is responsible for breaking up opposition plays and protecting the backline. His high frequency of tackles in transition zones highlights his effectiveness in regaining possession when the team is out of shape.

As shown in Figure 13, Luka Modric’s tackling distribution is more spread out, with notable activity in central and right midfield zones. The concentration of tackles around the center circle indicates his involvement in disrupting opposition build-up play, while his tackling in deeper positions reflects his role as a deep-lying playmaker who also supports defensive coverage. Despite his primary playmaking responsibilities, Modric’s contribution of 37 tackles per 90 minutes indicates a balanced approach between defensive and offensive duties.

As shown in Figure 15, Kevin De Bruyne’s heatmap, on the other hand, shows a clear concentration of defensive actions in the advanced central and right midfield areas. This distribution is consistent with his role as an attacking midfielder who applies pressure higher up the pitch rather than engaging deeply in defensive duties. The lower frequency of tackles compared to Kanté and Modric highlights De Bruyne’s offensive focus, with his defensive efforts primarily aimed at initiating the press rather than covering defensive gaps.

# Load the dataset
croatiadata <- read_csv(here("CroatiaTeamData.csv"))

# Filter Luka Modric's events (Duel + Others on the ball)
modric <- croatiadata %>%
  filter(player_id == 8287 & (event_name == "Duel"))

# Calculate total defensive contributions from player
total_contributions_modric <- nrow(modric)

# Get total playtime (sum of max event time in each match for player_id 8287)
total_playtime_seconds_modric <- croatiadata %>%
  filter(player_id == 8287) %>%
  group_by(match_id) %>%
  summarise(max_event_sec = max(event_sec, na.rm = TRUE)) %>%
  summarise(total_playtime = sum(max_event_sec, na.rm = TRUE)) %>%
  pull(total_playtime)

# Convert playtime to minutes
total_playtime_minutes_modric <- total_playtime_seconds_modric / 60

# Calculate defensive contributions per 90 minutes
contributions_per_90_modric <- round((total_contributions_modric / total_playtime_minutes_modric) * 90)

# Define pitch dimensions
pitch_length <- 100
pitch_width <- 100
penalty_box_length <- 16
penalty_box_width <- 62
six_yard_box_length <- 6
six_yard_box_width <- 26
goal_width <- 12

# Define 18-zone grid (3 vertical x 6 horizontal)
num_vertical <- 3
num_horizontal <- 6
grid_x_size <- pitch_length / num_horizontal
grid_y_size <- pitch_width / num_vertical

# Assign events to the 18-zone grid
df <- modric %>%
  mutate(grid_x = floor(start_x / grid_x_size),
         grid_y = floor(start_y / grid_y_size))

# Aggregate total contributions per grid cell
heatmap_data <- df %>%
  group_by(grid_x, grid_y) %>%
  summarise(event_count = n(), .groups = "drop")

# Generate the subtitle with the record count
subtitle_text_modric <- paste("Total successful defensive actions normalized per 90 minutes:", contributions_per_90_modric)

# Generate the heatmap with contribution counts in each zone
modric_heatmap_plot <- ggplot() +
  geom_tile(data = heatmap_data, aes(x = grid_x * grid_x_size + grid_x_size / 2, 
                                     y = grid_y * grid_y_size + grid_y_size / 2, 
                                     fill = event_count), color = "white") +
  scale_fill_gradient(low = "#ADD8E6", high = "#171796", limits = c(0, 26), labels = scales::label_number(accuracy = 1)) +
  labs(title = "Luka Modric Tackling Contributions",
       subtitle = subtitle_text_modric,
       fill = "Tackles Completed") +
  theme_classic() +
  theme(axis.text = element_blank(), 
        axis.ticks = element_blank(), 
        axis.title = element_blank(),
        panel.grid = element_blank()) +
  
  # Full pitch outline
  geom_rect(aes(xmin = 0, xmax = pitch_length, ymin = 0, ymax = pitch_width), 
            color = "grey", fill = NA) +
  
  # Add 18 equal-area grid lines (3x6)
  geom_segment(data = data.frame(x = seq(0, pitch_length, by = grid_x_size)),
               aes(x = x, xend = x, y = 0, yend = pitch_width),
               color = "grey", linetype = "dashed") +
  geom_segment(data = data.frame(y = seq(0, pitch_width, by = grid_y_size)),
               aes(x = 0, xend = pitch_length, y = y, yend = y),
               color = "grey", linetype = "dashed") +
  
  # Add event count labels in each zone
  geom_text(data = heatmap_data, aes(x = grid_x * grid_x_size + grid_x_size / 2, 
                                     y = grid_y * grid_y_size + grid_y_size / 2, 
                                     label = event_count), size = 5, fontface = "bold") +
  
  # Add penalty boxes
  geom_rect(aes(xmin = 0, xmax = penalty_box_length,
                ymin = (pitch_width - penalty_box_width) / 2,
                ymax = (pitch_width + penalty_box_width) / 2),
            color = "grey35", fill = NA) +
  geom_rect(aes(xmin = pitch_length - penalty_box_length, xmax = pitch_length,
                ymin = (pitch_width - penalty_box_width) / 2,
                ymax = (pitch_width + penalty_box_width) / 2),
            color = "grey35", fill = NA) +
  
  # Add six-yard boxes
  geom_rect(aes(xmin = 0, xmax = six_yard_box_length,
                ymin = (pitch_width - six_yard_box_width) / 2,
                ymax = (pitch_width + six_yard_box_width) / 2),
            color = "grey35", fill = NA) +
  geom_rect(aes(xmin = pitch_length - six_yard_box_length, xmax = pitch_length,
                ymin = (pitch_width - six_yard_box_width) / 2,
                ymax = (pitch_width + six_yard_box_width) / 2),
            color = "grey35", fill = NA) +
  
  # Add goals
  geom_segment(aes(x = 0, xend = 0, 
                   y = (pitch_width - goal_width) / 2, 
                   yend = (pitch_width + goal_width) / 2),
               color = "black", size = 3) +
  geom_segment(aes(x = pitch_length, xend = pitch_length, 
                   y = (pitch_width - goal_width) / 2, 
                   yend = (pitch_width + goal_width) / 2),
               color = "black", size = 3) +
  
  # Add center circle and line
  geom_circle(aes(x0 = pitch_length / 2, y0 = pitch_width / 2, r = 9.15),
              color = "grey35", inherit.aes = FALSE) +
  geom_segment(aes(x = pitch_length / 2, xend = pitch_length / 2,
                   y = 0, yend = pitch_width),
               color = "grey35") +
  
  # Attack direction annotation
  annotate("segment", x = 45, xend = 80, y = 110, yend = 110, 
           arrow = arrow(length = unit(0.3, "inches")), color = "black", size = 1.2) +
  annotate("text", x = 60, y = 120, label = "Attacking Direction", size = 5, fontface = "bold") +
  
  coord_fixed(ratio = 80 / 100)

print(modric_heatmap_plot)

Figure 13. Heatmap highlighting the location of Luka Modric’s defensive tackles within 18 different zones around the pitch across the seven matches played in the 2018 World Cup.

# Load the dataset
francedata <- read_csv(here("FranceTeamData.csv"))

# Filter N'Golo Kanté's events (Duel + Others on the ball)
Kanté <- francedata %>%
  filter(player_id == 31528 & (event_name == "Duel" | event_name == "Others on the ball"))

# Calculate total defensive contributions from player
total_contributions_Kanté <- nrow(Kanté)

# Get total playtime (sum of max event time in each match for player_id 31528)
total_playtime_seconds_Kanté <- francedata %>%
  filter(player_id == 31528) %>%
  group_by(match_id) %>%
  summarise(max_event_sec = max(event_sec, na.rm = TRUE)) %>%
  summarise(total_playtime = sum(max_event_sec, na.rm = TRUE)) %>%
  pull(total_playtime)

# Convert playtime to minutes
total_playtime_minutes_Kanté <- total_playtime_seconds_Kanté / 60

# Calculate defensive contributions per 90 minutes
contributions_per_90_Kanté <- round((total_contributions_Kanté / total_playtime_minutes_Kanté) * 90)

# Define pitch dimensions
pitch_length <- 100
pitch_width <- 102
penalty_box_length <- 16
penalty_box_width <- 62
six_yard_box_length <- 6
six_yard_box_width <- 26
goal_width <- 12

# Define 18-zone grid (3 vertical x 6 horizontal)
num_vertical <- 3
num_horizontal <- 6
grid_x_size <- pitch_length / num_horizontal
grid_y_size <- pitch_width / num_vertical

# Assign events to the 18-zone grid
df <- Kanté %>%
  mutate(grid_x = floor(start_x / grid_x_size),
         grid_y = floor(start_y / grid_y_size))

# Aggregate total contributions per grid cell
heatmap_data <- df %>%
  group_by(grid_x, grid_y) %>%
  summarise(event_count = n(), .groups = "drop")

# Generate the subtitle with the record count
subtitle_text_Kanté <- paste("Total successful defensive actions normalized per 90 minutes:", contributions_per_90_Kanté)

# Generate the heatmap with contribution counts in each zone
Kanté_heatmap_plot <- ggplot() +
  geom_tile(data = heatmap_data, aes(x = grid_x * grid_x_size + grid_x_size / 2, 
                                     y = grid_y * grid_y_size + grid_y_size / 2, 
                                     fill = event_count), color = "white") +
  scale_fill_gradient(low = "#FFCCCB", high = "#E1000F", limits = c(0, 26), labels = scales::label_number(accuracy = 1)) +
  labs(title = "N'Golo Kanté Tackling Contributions",
       subtitle = subtitle_text_Kanté,
       fill = "Tackles Completed") +
  theme_classic() +
  theme(axis.text = element_blank(), 
        axis.ticks = element_blank(), 
        axis.title = element_blank(),
        panel.grid = element_blank()) +
  
  # Full pitch outline
  geom_rect(aes(xmin = 0, xmax = pitch_length, ymin = 0, ymax = pitch_width), 
            color = "grey", fill = NA) +
  
  # Add 18 equal-area grid lines (3x6)
  geom_segment(data = data.frame(x = seq(0, pitch_length, by = grid_x_size)),
               aes(x = x, xend = x, y = 0, yend = pitch_width),
               color = "grey", linetype = "dashed") +
  geom_segment(data = data.frame(y = seq(0, pitch_width, by = grid_y_size)),
               aes(x = 0, xend = pitch_length, y = y, yend = y),
               color = "grey", linetype = "dashed") +
  
  # Add event count labels in each zone
  geom_text(data = heatmap_data, aes(x = grid_x * grid_x_size + grid_x_size / 2, 
                                     y = grid_y * grid_y_size + grid_y_size / 2, 
                                     label = event_count), size = 5, fontface = "bold") +
  
  # Add penalty boxes
  geom_rect(aes(xmin = 0, xmax = penalty_box_length,
                ymin = (pitch_width - penalty_box_width) / 2,
                ymax = (pitch_width + penalty_box_width) / 2),
            color = "grey35", fill = NA) +
  geom_rect(aes(xmin = pitch_length - penalty_box_length, xmax = pitch_length,
                ymin = (pitch_width - penalty_box_width) / 2,
                ymax = (pitch_width + penalty_box_width) / 2),
            color = "grey35", fill = NA) +
  
  # Add six-yard boxes
  geom_rect(aes(xmin = 0, xmax = six_yard_box_length,
                ymin = (pitch_width - six_yard_box_width) / 2,
                ymax = (pitch_width + six_yard_box_width) / 2),
            color = "grey35", fill = NA) +
  geom_rect(aes(xmin = pitch_length - six_yard_box_length, xmax = pitch_length,
                ymin = (pitch_width - six_yard_box_width) / 2,
                ymax = (pitch_width + six_yard_box_width) / 2),
            color = "grey35", fill = NA) +
  
  # Add goals
  geom_segment(aes(x = 0, xend = 0, 
                   y = (pitch_width - goal_width) / 2, 
                   yend = (pitch_width + goal_width) / 2),
               color = "black", size = 3) +
  geom_segment(aes(x = pitch_length, xend = pitch_length, 
                   y = (pitch_width - goal_width) / 2, 
                   yend = (pitch_width + goal_width) / 2),
               color = "black", size = 3) +
  
  # Add center circle and line
  geom_circle(aes(x0 = pitch_length / 2, y0 = pitch_width / 2, r = 9.15),
              color = "grey35", inherit.aes = FALSE) +
  geom_segment(aes(x = pitch_length / 2, xend = pitch_length / 2,
                   y = 0, yend = pitch_width),
               color = "grey35") +
  
  # Attack direction annotation
  annotate("segment", x = 45, xend = 80, y = 110, yend = 110, 
           arrow = arrow(length = unit(0.3, "inches")), color = "black", size = 1.2) +
  annotate("text", x = 60, y = 120, label = "Attacking Direction", size = 5, fontface = "bold") +
  
  coord_fixed(ratio = 80 / 100)

print(Kanté_heatmap_plot)

Figure 14. Heatmap highlighting the location of N’Golo Kanté’s defensive tackles within 18 different zones around the pitch across the seven matches played in the 2018 World Cup.

# Load the dataset
belgiumdata <- read_csv(here("BelgiumTeamData.csv"))

# Filter Kevin De Bruyne's events (Duel + Others on the ball)
kdb_defense <- belgiumdata %>%
  filter(player_id == 38021 & (event_name == "Duel"))

# Calculate total defensive contributions from player
total_contributions_kdb_defense <- nrow(kdb_defense)

# Get total playtime (sum of max event time in each match for player_id 31528)
total_playtime_seconds_kdb_defense <- belgiumdata %>%
  filter(player_id == 38021) %>%
  group_by(match_id) %>%
  summarise(max_event_sec = max(event_sec, na.rm = TRUE)) %>%
  summarise(total_playtime = sum(max_event_sec, na.rm = TRUE)) %>%
  pull(total_playtime)

# Convert playtime to minutes
total_playtime_minutes_kdb_defense <- total_playtime_seconds_kdb_defense / 60

# Calculate defensive contributions per 90 minutes
contributions_per_90_kdb <- round((total_contributions_kdb_defense / total_playtime_minutes_kdb_defense) * 90)

# Define pitch dimensions
pitch_length <- 100
pitch_width <- 100
penalty_box_length <- 16
penalty_box_width <- 62
six_yard_box_length <- 6
six_yard_box_width <- 26
goal_width <- 12

# Define 18-zone grid (3 vertical x 6 horizontal)
num_vertical <- 3
num_horizontal <- 6
grid_x_size <- pitch_length / num_horizontal
grid_y_size <- pitch_width / num_vertical

# Assign events to the 18-zone grid
df <- kdb_defense %>%
  mutate(grid_x = floor(start_x / grid_x_size),
         grid_y = floor(start_y / grid_y_size))

# Aggregate total contributions per grid cell
heatmap_data <- df %>%
  group_by(grid_x, grid_y) %>%
  summarise(event_count = n(), .groups = "drop")

# Generate the subtitle with the record count
subtitle_text_kdb <- paste("Total successful defensive actions normalized per 90 minutes:", contributions_per_90_kdb)

# Generate the heatmap with contribution counts in each zone
kdb_heatmap_plot <- ggplot() +
  geom_tile(data = heatmap_data, aes(x = grid_x * grid_x_size + grid_x_size / 2, 
                                     y = grid_y * grid_y_size + grid_y_size / 2, 
                                     fill = event_count), color = "white") +
  scale_fill_gradient(low = "#FFFFC5", high = "#FFCD00", limits = c(0, 26), labels = scales::label_number(accuracy = 1)) +
  labs(title = "Kevin De Bruyne Tackling Contributions",
       subtitle = subtitle_text_kdb,
       fill = "Tackles Completed") +
  theme_classic() +
  theme(axis.text = element_blank(), 
        axis.ticks = element_blank(), 
        axis.title = element_blank(),
        panel.grid = element_blank()) +
  
  # Full pitch outline
  geom_rect(aes(xmin = 0, xmax = pitch_length, ymin = 0, ymax = pitch_width), 
            color = "grey", fill = NA) +
  
  # Add 18 equal-area grid lines (3x6)
  geom_segment(data = data.frame(x = seq(0, pitch_length, by = grid_x_size)),
               aes(x = x, xend = x, y = 0, yend = pitch_width),
               color = "grey", linetype = "dashed") +
  geom_segment(data = data.frame(y = seq(0, pitch_width, by = grid_y_size)),
               aes(x = 0, xend = pitch_length, y = y, yend = y),
               color = "grey", linetype = "dashed") +
  
  # Add event count labels in each zone
  geom_text(data = heatmap_data, aes(x = grid_x * grid_x_size + grid_x_size / 2, 
                                     y = grid_y * grid_y_size + grid_y_size / 2, 
                                     label = event_count), size = 5, fontface = "bold") +
  
  # Add penalty boxes
  geom_rect(aes(xmin = 0, xmax = penalty_box_length,
                ymin = (pitch_width - penalty_box_width) / 2,
                ymax = (pitch_width + penalty_box_width) / 2),
            color = "grey35", fill = NA) +
  geom_rect(aes(xmin = pitch_length - penalty_box_length, xmax = pitch_length,
                ymin = (pitch_width - penalty_box_width) / 2,
                ymax = (pitch_width + penalty_box_width) / 2),
            color = "grey35", fill = NA) +
  
  # Add six-yard boxes
  geom_rect(aes(xmin = 0, xmax = six_yard_box_length,
                ymin = (pitch_width - six_yard_box_width) / 2,
                ymax = (pitch_width + six_yard_box_width) / 2),
            color = "grey35", fill = NA) +
  geom_rect(aes(xmin = pitch_length - six_yard_box_length, xmax = pitch_length,
                ymin = (pitch_width - six_yard_box_width) / 2,
                ymax = (pitch_width + six_yard_box_width) / 2),
            color = "grey35", fill = NA) +
  
  # Add goals
  geom_segment(aes(x = 0, xend = 0, 
                   y = (pitch_width - goal_width) / 2, 
                   yend = (pitch_width + goal_width) / 2),
               color = "black", size = 3) +
  geom_segment(aes(x = pitch_length, xend = pitch_length, 
                   y = (pitch_width - goal_width) / 2, 
                   yend = (pitch_width + goal_width) / 2),
               color = "black", size = 3) +
  
  # Add center circle and line
  geom_circle(aes(x0 = pitch_length / 2, y0 = pitch_width / 2, r = 9.15),
              color = "grey35", inherit.aes = FALSE) +
  geom_segment(aes(x = pitch_length / 2, xend = pitch_length / 2,
                   y = 0, yend = pitch_width),
               color = "grey35") +
  
  # Attack direction annotation
  annotate("segment", x = 45, xend = 80, y = 110, yend = 110, 
           arrow = arrow(length = unit(0.3, "inches")), color = "black", size = 1.2) +
  annotate("text", x = 60, y = 120, label = "Attacking Direction", size = 5, fontface = "bold") +
  
  coord_fixed(ratio = 80 / 100)

print(kdb_heatmap_plot)

Figure 15. Heatmap highlighting the location of Kevin De Bruyne’s defensive tackles within 18 different zones around the pitch across the seven matches played in the 2018 World Cup.

Key Takeaways

The analysis of Luka Modric, Kevin De Bruyne, and N’Golo Kanté’s passing networks and performance metrics highlights their distinct playing styles and roles on the pitch during the 2018 World Cup. Luka Modric stands out as a balanced playmaker, excelling in Progressive Passes (PrgP: 1.49) and Completed Passes (Cmp: 0.900). His passing network is well-distributed across the pitch with a high volume of successful passes (108 per 90 minutes), focusing on both short and long diagonal balls that break defensive lines. Modric also shows a statistically significant clustering of passes, as indicated by his Moran’s I test (p < 0.05), demonstrating a tendency to operate within specific zones.

Kevin De Bruyne, known for his attacking mindset, excels in Shot Creating Actions (SCA: 1.33) and Goal Creating Actions (GCA: 1.45), with solid progressive passing (PrgP: 0.964). His passing network is more vertically oriented, emphasizing direct plays into the attacking third with an average of 77 passes per 90 minutes. Although De Bruyne’s passing distribution shows some clustering, it lacks statistical significance (Moran’s I test: p > 0.05), reflecting his flexibility in creating opportunities from various positions.

N’Golo Kanté’s role is distinctly defensive, with a remarkable Interceptions (Int: 1.85) and positive Tackles (Tkl: 0.201), highlighting his effectiveness in ball recovery. His passing network averages 89 passes per 90 minutes, characterized by shorter, conservative passes that maintain possession rather than advancing play directly. His Progressive Passes (PrgP: 0.103) and Shot Creating Actions (SCA: -0.700) are limited, indicating his priority on defensive stability. Kanté’s passing distribution is the most dispersed among the three, with no significant spatial autocorrelation (Moran’s I test: p > 0.05), reflecting his role in distributing the ball widely rather than centralizing play.

Overall, Modric demonstrates a combination of creative playmaking and defensive contributions, De Bruyne excels in offensive creation with flexible positioning, and Kanté anchors the midfield with exceptional ball-winning abilities and controlled distribution. The objective metrics and spatial analysis reinforce their unique tactical roles and the value they bring to their respective teams.

References

FIFA. “Pitch Dimensions and Surrounding Areas.” Football Stadiums Guidelines Technical Guidelines, FIFA, https://publications.fifa.com/de/football-stadiums-guidelines/technical-guideline/stadium-guidelines/pitch-dimensions-and-surrounding-areas/. Accessed 11 Mar. 2025.

Gimond, Michel. Spatial Analysis in R: Point Pattern Analysis. 2025, https://mgimond.github.io/Spatial/chp11_0.html. Accessed 20 Mar. 2025.

Kavlakoglu, Eda, and Vanna Winland. “What Is K-Means Clustering?” IBM, 26 June 2024, https://www.ibm.com/think/topics/k-means-clustering.

Moraga, P. (n.d.). Spatial autocorrelation. Retrieved [2/25/25], from https://www.paulamoraga.com/book-spatial/spatial-autocorrelation.html#global-morans-i

Moran, P. A. P. 1950. “Notes on Continuous Stochastic Phenomena.” Biometrika 37 (1/2): 17–23. https://doi.org/https://doi.org/10.2307/2332142.

Pappalardo, L., Cintia, P., Rossi, A. et al. A public data set of spatio-temporal match events in soccer competitions. Sci Data 6, 236 (2019). https://doi.org/10.1038/s41597-019-0247-7

Pappalardo, Luca; Massucco, Emanuele (2019). Soccer match event dataset. figshare. Collection. https://doi.org/10.6084/m9.figshare.c.4415000

R Core Team. “K-Means Clustering.” UC Business Analytics R Programming Guide, University of Cincinnati, https://uc-r.github.io/kmeans_clustering#kmeans. Accessed 10 Mar. 2025.

Smarter Soccer. The Soccer Zones. Smarter Soccer, n.d., https://smarter.soccer/soccer-game-intelligence-free-downloads/the-soccer-zones/. Accessed 20 Mar. 2025.

Tobler, W. R. 1970. “A Computer Movie Simulating Urban Growth in the Detroit Region.” Economic Geography 46: 234–40. https://doi.org/https://doi.org/10.2307/143141.

Wyscout Data Glossary. Wyscout, 2025, https://dataglossary.wyscout.com/. Accessed 10 Mar. 2025.

Wyscout. “Pitch Coordinates.” Wyscout Glossary, https://dataglossary.wyscout.com/pitch_coordinates/. Accessed 21 Mar. 2025.

Younas, Mobeen. “FIFA World Cup Golden Ball Award Winners List (1930-2022).” FootballCoal, 3 Oct. 2022, https://footballcoal.com/fifa-world-cup-golden-ball-award-winners-list-1930-2022/. Accessed 20 Mar. 2025.

Data-Driven Insights from the 2018 World Cup: A Comparative Analysis of Luka Modric, Kevin De Bruyne, and N’Golo Kanté

Ethan McGhee, March 2025