Introduction

Sports data analytics is a field that is rapidly expanding its reach, driving innovation across the global sports industry. This growth is reflected in the increasing number of tools designed for specialized analysis. Nevertheless, “classics” like the R programming language remain the gold standard due to their unparalleled statistical capabilities.

In this session, we will explore the practical applications of football-specific libraries, moving from raw data processing to high-end visualization. Specifically, we will examine:

We will demonstrate how to leverage these tools to transform complex data into actionable insights.

StatsBombR

StatsBombR, built in 2018, allows users to stream StatsBomb event data via a paid API or access free open data hosted on their GitHub page. The library provides structured access to match events, lineups, and competition metadata, making it straightforward to load and work with detailed football event data in R.

Repository: https://github.com/statsbomb/StatsBombR

Statsbomb open data overview

StatsBomb organises its open data in a three-level hierarchy: competitionsmatchesevents. FreeCompetitions() returns a flat table of all available competition–season pairs. FreeMatches() accepts that table (or a filtered subset) and returns match-level metadata for the selected season. From there, individual matches can be isolated by match_id before pulling the full event stream.

ggshakeR

ggshakeR was created by Abhishek A. Mishra and is used for plotting pitches and other visualisations common in football analytics. It can consume data from Understat or StatsBomb to produce: * heatmaps, * pass flows, * Voronoi diagrams, * convex hulls, * sonar plots, * shot maps, and more.

It also allows to calculate EPV and xT values.

It is built on top of ggplot2, the widely-used visualisation library in the R ecosystem.

Below we will use ggshakeR to create a few visualisation examples based on the data we accessed using StatsBombR.

Repository: https://github.com/abhiamishra/ggshakeR

Installation

To install both libraries you need to execute the code below:

Loading event data

With a match identified, free_allevents() downloads the full event stream — every pass, shot, dribble, and off-the-ball action recorded for that game. The raw payload uses StatsBomb’s internal naming conventions (distances in yards, nested location vectors), so allclean() is applied immediately after to normalise column names, convert units to metres, and unnest the location columns into flat x/y coordinates ready for analysis.

# Bundesliga 2023/2024, we extract match between Bayer Leverkusen and Eintracht Frankfurt
COMP_ID  <- 9
SEAS_ID  <- 281
MATCH_ID <- 3895180

Comp <- FreeCompetitions()
## [1] "Whilst we are keen to share data and facilitate research, we also urge you to be responsible with the data. Please credit StatsBomb as your data source when using the data and visit https://statsbomb.com/media-pack/ to obtain our logos for public use."
Matches <- FreeMatches(Comp %>% filter(competition_id == COMP_ID & season_id == SEAS_ID))
## [1] "Whilst we are keen to share data and facilitate research, we also urge you to be responsible with the data. Please credit StatsBomb as your data source when using the data and visit https://statsbomb.com/media-pack/ to obtain our logos for public use."
SingleMatch <- Matches %>% filter(match_id == MATCH_ID) %>% slice(1)

SingleMatch
## # A tibble: 1 × 42
##   match_id match_date kick_off     home_score away_score match_status
##      <int> <chr>      <chr>             <int>      <int> <chr>       
## 1  3895180 2023-12-17 18:30:00.000          3          0 available   
## # ℹ 36 more variables: match_status_360 <chr>, last_updated <chr>,
## #   last_updated_360 <chr>, match_week <int>, competition.competition_id <int>,
## #   competition.country_name <chr>, competition.competition_name <chr>,
## #   season.season_id <int>, season.season_name <chr>,
## #   home_team.home_team_id <int>, home_team.home_team_name <chr>,
## #   home_team.home_team_gender <chr>, home_team.home_team_group <lgl>,
## #   home_team.managers <list>, home_team.country.id <int>, …
StatsBombData <- free_allevents(MatchesDF = SingleMatch, Parallel = TRUE)
## [1] "Whilst we are keen to share data and facilitate research, we also urge you to be responsible with the data. Please credit StatsBomb as your data source when using the data and visit https://statsbomb.com/media-pack/ to obtain our logos for public use."
# allclean standardises column names and units (yards → metres, etc.)
plotting_data <- allclean(StatsBombData)

plotting_data <- plotting_data %>%
  rename(
    x      = location.x,
    y      = location.y,
    finalX = pass.end_location.x,
    finalY = pass.end_location.y
  )

Convex hulls

Convex hulls allow us to analyze the spatial extent of players’ actions. As shown below, Amiri’s passing activity was limited to a small area, whereas Xhaka’s convex hull demonstrates that he covers a significant portion of the pitch.

convexPlot <- plotting_data %>%
  filter(team.name == "Bayer Leverkusen") %>%
    plot_convexhull(title= "Convex hull of all actions by Bayer Leverkusen",data_type = "statsbomb")

convexPlot

Voronoi diagrams

Using ggshakeR’s Voronoi diagram plots we can check spaces on the field for players to take up - in this example we see density of players in the midfield, high defensive line and a lot of space for the wingers to operate in.

finalData <- plotting_data %>%
  filter(team.name == "Bayer Leverkusen") %>%
  group_by(player.name) %>%
  summarise(across(c(x, y, minute), \(col) mean(col, na.rm = TRUE))) %>%
  na.omit()

plotVoronoi <- plot_voronoi(finalData,data_type = "statsbomb", title = "Bayer Leverkusen Voronoi plots")
plotVoronoi

Pass networks

Pass networks are useful when we want to see how the team was set up on the pitch during the match, which players had the most passes between themselves and what’s their average position on the pitch. In this example we see that Bayer Leverkusen relies on their wingbacks which are very active in terms of creating threat and also a lot of team’s passes goes through Xhaka and Palacios. which are the key players in the midfield.

passnetPlot <- plotting_data %>%
  plot_passnet(team_name = "Bayer Leverkusen")

passnetPlot

Conclusion

StatsBombR and ggshakeR demonstrate how a well-scoped library pair can cover the full analytics pipeline from raw event ingestion to publication-ready visualisations, without leaving the R ecosystem. StatsBombR handles the plumbing: authentication, pagination, and the opinionated allclean() transformation that normalises units and column names into a consistent schema. ggshakeR sits on top of that schema and trades flexibility for convenience, letting analysts produce convex hulls, Voronoi diagrams, pass networks and others in a single function call rather than rebuilding pitch geometry from scratch each time.

The three visualisations produced here illustrate complementary views of the same match. The convex hull gives a coarse territorial summary — how much of the pitch Bayer Leverkusen occupied in aggregate. The Voronoi diagram refines that picture to individual players, exposing spatial structure: a compact midfield block, a high defensive line, and wide corridors for the wingers. The pass network then layers in the relational dimension, showing that the team’s ball circulation ran predominantly through Xhaka and Palacios in the centre and through the wing-backs as attacking outlets.

Together, the libraries lower the barrier to entry for tactical analysis in R. The main trade-off is the tight coupling to StatsBomb’s data format — switching to a different provider requires non-trivial remapping. For teams already within the StatsBomb ecosystem, however, this integration is a significant productivity advantage.