Sports data analytics is a field that is rapidly expanding its reach, driving innovation across the global sports industry. This growth is reflected in the increasing number of tools designed for specialized analysis. Nevertheless, “classics” like the R programming language remain the gold standard due to their unparalleled statistical capabilities.
In this session, we will explore the practical applications of football-specific libraries, moving from raw data processing to high-end visualization. Specifically, we will examine:
statsbombR: For accessing and processing high-fidelity event data from StatsBomb.
ggshakeR: For creating advanced visualizations like pass maps, convex hulls and others.
We will demonstrate how to leverage these tools to transform complex data into actionable insights.
StatsBombR, built in 2018, allows users to stream StatsBomb event data via a paid API or access free open data hosted on their GitHub page. The library provides structured access to match events, lineups, and competition metadata, making it straightforward to load and work with detailed football event data in R.
Repository: https://github.com/statsbomb/StatsBombR
StatsBomb organises its open data in a three-level hierarchy:
competitions → matches →
events. FreeCompetitions() returns a flat
table of all available competition–season pairs.
FreeMatches() accepts that table (or a filtered subset) and
returns match-level metadata for the selected season. From there,
individual matches can be isolated by match_id before
pulling the full event stream.
ggshakeR was created by Abhishek A. Mishra and is used for plotting pitches and other visualisations common in football analytics. It can consume data from Understat or StatsBomb to produce: * heatmaps, * pass flows, * Voronoi diagrams, * convex hulls, * sonar plots, * shot maps, and more.
It also allows to calculate EPV and xT values.
It is built on top of ggplot2, the widely-used visualisation library in the R ecosystem.
Below we will use ggshakeR to create a few visualisation examples based on the data we accessed using StatsBombR.
Repository: https://github.com/abhiamishra/ggshakeR
To install both libraries you need to execute the code below:
With a match identified, free_allevents() downloads the
full event stream — every pass, shot, dribble, and off-the-ball action
recorded for that game. The raw payload uses StatsBomb’s internal naming
conventions (distances in yards, nested location vectors), so
allclean() is applied immediately after to normalise column
names, convert units to metres, and unnest the location columns into
flat x/y coordinates ready for analysis.
# Bundesliga 2023/2024, we extract match between Bayer Leverkusen and Eintracht Frankfurt
COMP_ID <- 9
SEAS_ID <- 281
MATCH_ID <- 3895180
Comp <- FreeCompetitions()
## [1] "Whilst we are keen to share data and facilitate research, we also urge you to be responsible with the data. Please credit StatsBomb as your data source when using the data and visit https://statsbomb.com/media-pack/ to obtain our logos for public use."
Matches <- FreeMatches(Comp %>% filter(competition_id == COMP_ID & season_id == SEAS_ID))
## [1] "Whilst we are keen to share data and facilitate research, we also urge you to be responsible with the data. Please credit StatsBomb as your data source when using the data and visit https://statsbomb.com/media-pack/ to obtain our logos for public use."
SingleMatch <- Matches %>% filter(match_id == MATCH_ID) %>% slice(1)
SingleMatch
## # A tibble: 1 × 42
## match_id match_date kick_off home_score away_score match_status
## <int> <chr> <chr> <int> <int> <chr>
## 1 3895180 2023-12-17 18:30:00.000 3 0 available
## # ℹ 36 more variables: match_status_360 <chr>, last_updated <chr>,
## # last_updated_360 <chr>, match_week <int>, competition.competition_id <int>,
## # competition.country_name <chr>, competition.competition_name <chr>,
## # season.season_id <int>, season.season_name <chr>,
## # home_team.home_team_id <int>, home_team.home_team_name <chr>,
## # home_team.home_team_gender <chr>, home_team.home_team_group <lgl>,
## # home_team.managers <list>, home_team.country.id <int>, …
StatsBombData <- free_allevents(MatchesDF = SingleMatch, Parallel = TRUE)
## [1] "Whilst we are keen to share data and facilitate research, we also urge you to be responsible with the data. Please credit StatsBomb as your data source when using the data and visit https://statsbomb.com/media-pack/ to obtain our logos for public use."
# allclean standardises column names and units (yards → metres, etc.)
plotting_data <- allclean(StatsBombData)
plotting_data <- plotting_data %>%
rename(
x = location.x,
y = location.y,
finalX = pass.end_location.x,
finalY = pass.end_location.y
)
Convex hulls allow us to analyze the spatial extent of players’ actions. As shown below, Amiri’s passing activity was limited to a small area, whereas Xhaka’s convex hull demonstrates that he covers a significant portion of the pitch.
convexPlot <- plotting_data %>%
filter(team.name == "Bayer Leverkusen") %>%
plot_convexhull(title= "Convex hull of all actions by Bayer Leverkusen",data_type = "statsbomb")
convexPlot
Using ggshakeR’s Voronoi diagram plots we can check spaces on the field for players to take up - in this example we see density of players in the midfield, high defensive line and a lot of space for the wingers to operate in.
finalData <- plotting_data %>%
filter(team.name == "Bayer Leverkusen") %>%
group_by(player.name) %>%
summarise(across(c(x, y, minute), \(col) mean(col, na.rm = TRUE))) %>%
na.omit()
plotVoronoi <- plot_voronoi(finalData,data_type = "statsbomb", title = "Bayer Leverkusen Voronoi plots")
plotVoronoi
Pass networks are useful when we want to see how the team was set up on the pitch during the match, which players had the most passes between themselves and what’s their average position on the pitch. In this example we see that Bayer Leverkusen relies on their wingbacks which are very active in terms of creating threat and also a lot of team’s passes goes through Xhaka and Palacios. which are the key players in the midfield.
passnetPlot <- plotting_data %>%
plot_passnet(team_name = "Bayer Leverkusen")
passnetPlot
StatsBombR and ggshakeR demonstrate how a well-scoped library pair
can cover the full analytics pipeline from raw event ingestion to
publication-ready visualisations, without leaving the R ecosystem.
StatsBombR handles the plumbing: authentication, pagination, and the
opinionated allclean() transformation that normalises units
and column names into a consistent schema. ggshakeR sits on top of that
schema and trades flexibility for convenience, letting analysts produce
convex hulls, Voronoi diagrams, pass networks and others in a single
function call rather than rebuilding pitch geometry from scratch each
time.
The three visualisations produced here illustrate complementary views of the same match. The convex hull gives a coarse territorial summary — how much of the pitch Bayer Leverkusen occupied in aggregate. The Voronoi diagram refines that picture to individual players, exposing spatial structure: a compact midfield block, a high defensive line, and wide corridors for the wingers. The pass network then layers in the relational dimension, showing that the team’s ball circulation ran predominantly through Xhaka and Palacios in the centre and through the wing-backs as attacking outlets.
Together, the libraries lower the barrier to entry for tactical analysis in R. The main trade-off is the tight coupling to StatsBomb’s data format — switching to a different provider requires non-trivial remapping. For teams already within the StatsBomb ecosystem, however, this integration is a significant productivity advantage.