M9 Collab Task

S. Luebbert

While researching for the M9 Collab Task, I first found the “ggfootball” library. This library is associated with the UnderStat website. The library allows for simple, one-line code to present xG chart timelines and xG shot matches for an matches that have an UnderStat match id. However, the UnderStat website has been making scraping difficult. I was planning on using UnderStat to show more match information, along with the visuals from “ggfootball.” However, given these difficulties, I’ve decided to make an xG shot map using “ggsoccer” and data from StatsBombR. I want to make both shot maps in order to compare the shot maps from each library, and to show the usefulness of StatsBombR as a public webiste with reliable soccer data.

In this chunk, I set my working directory and I set the mirror.

After setting my working directory, I install the following packages: ggfootball, StatsBombR, ggsoccer, ggplot2, and dplyr.

## glue    (1.8.0 -> 1.8.1) [CRAN]
## openssl (2.3.5 -> 2.4.0) [CRAN]
## 
## The downloaded binary packages are in
##  /var/folders/gs/nphz1j6d3n552n2d6mp8kkxr0000gn/T//RtmpyheVkb/downloaded_packages
## ── R CMD build ─────────────────────────────────────────────────────────────────
##      checking for file ‘/private/var/folders/gs/nphz1j6d3n552n2d6mp8kkxr0000gn/T/RtmpyheVkb/remotesef173e3cbbd4/statsbomb-StatsBombR-0685b9d/DESCRIPTION’ ...  ✔  checking for file ‘/private/var/folders/gs/nphz1j6d3n552n2d6mp8kkxr0000gn/T/RtmpyheVkb/remotesef173e3cbbd4/statsbomb-StatsBombR-0685b9d/DESCRIPTION’
##   ─  preparing ‘StatsBombR’:
##      checking DESCRIPTION meta-information ...  ✔  checking DESCRIPTION meta-information
##   ─  checking for LF line-endings in source and make files and shell scripts
##   ─  checking for empty or unneeded directories
##    Omitted ‘LazyData’ from DESCRIPTION
##   ─  building ‘StatsBombR_0.1.0.tar.gz’
##      
## 

Next, I call the necessary libraries.

library(ggfootball)
library(StatsBombR)
library(dplyr)
library(ggsoccer)
library(ggplot2)
library(dplyr)
library(purrr)
library(plotly)

Below, you can see the code for the xG chart timeline from the “ggfootball” library. The function “xg_chart” is a function that requires the match id, a color for the home team, a color for the away team, and the competition in which the match took place. The background color of the chart (bg_color) and the background color of the visual (plot_bg_color) are not required but can be used to make the plot more aesthetically pleasing. As you can see in the graph below the code chunk, the visual includes when each goal is scored along with relevant data to the goal. The plot also writes it’s own title with the teams and final score, and it automatically includes the date of the match, the league, and the season.

# xG chart
xg_chart(match_id = 403, 
         home_team_color = "red", 
         away_team_color = "grey", 
         competition = "Premier League",
         bg_color = "#FFF1E5",
         plot_bg_color = "#FFF1E5")

The ggfootball library also includes a function that quickly creates a shot map visual. The shot map includes each shot registered in the match, the shooter, the xG. The map uses different shapes to efficiently show the result of each shot. By hovering over each shape, the user can see the relevant shot information appear. This function only requires the match id. The user can write the title of the graph, but the default title is “xG Map.” Other than changing the title and choosing the match, the visual cannot be customized in any other way. The function doesn’t allow for creativity, but it does efficiently and elegantly display the relevant shot information for the given match.

# Shot/xG map
xg_map(match_id = 403, title = "Liverpool 4 - 1 Stoke xG Map")

The ggfootball function is a useful library for a quick xG and shot analysis of individual matches using Understat data. Understat is a helpful, free public source to access football match data; however, with the changes being implemented, it is difficult to use worldfootballR and understatr to scrape the website for match and player specific data.

In the following code chunks, I will use the ggsoccer and ggplot2 libraries to scrape StatsBombR for data from the same match. I will use this data to make another xG shot map to compare the pros and cons from using the different libraries while creating visuals in R.

Since I have already installed the packages and called the libraries above, I will begin with getting the competition list and match list. I use the competition list to find the competition_id and season_id for the match that I’m looking for.

comp <- FreeCompetitions()
## [1] "Whilst we are keen to share data and facilitate research, we also urge you to be responsible with the data. Please credit StatsBomb as your data source when using the data and visit https://statsbomb.com/media-pack/ to obtain our logos for public use."
matches <- FreeMatches(comp)
## [1] "Whilst we are keen to share data and facilitate research, we also urge you to be responsible with the data. Please credit StatsBomb as your data source when using the data and visit https://statsbomb.com/media-pack/ to obtain our logos for public use."

Next, I filter for the match that I want to use for the visual. Once I find the match, I get the match_id so I can use it to get the events data from StatsBomb.

match <- matches %>% rename(competition_id = competition.competition_id, 
                   season_id = season.season_id, 
                   home_team_name = home_team.home_team_name,
                   away_team_name = away_team.away_team_name)

pl_1516 <- match %>%
  filter(competition_id == 2,
         season_id == 27)

liv_stoke <- pl_1516 %>%
  filter(
    (home_team_name == "Liverpool" & away_team_name == "Stoke City")
  )
library(jsonlite)

url <- "https://raw.githubusercontent.com/statsbomb/open-data/master/data/events/3753986.json"

events <- fromJSON(url, flatten = TRUE)

Next, I filter the events data for shots. In this same code chunk and function, I define the variables that are going to plot the shots and also determine the color, shape, and size of the plot. I also want the visual to be interactive, like the ggfootball xG shot graph above, so I’m using defining the variables that I want to appear in the tooltip when I hover over each shot.

shots <- events %>%
  filter(type.name == "Shot") %>%
  mutate(
    x = map_dbl(location, 1),
    y = map_dbl(location, 2),
    xg = shot.statsbomb_xg,
    team = team.name,
    outcome = shot.outcome.name,
    stroke_size = ifelse(outcome == 'Goal', 1.5, 0.5),
    player = player.name,
    body_part = shot.body_part.name,
    tooltip = paste0(
      "<b>", player, "</b><br>",
      "xG: ", round(xg, 3), "<br>",
      "Outcome: ", outcome, "<br>",
      "Body Part: ", body_part, "<br>",
      "Minute: ", minute
  )
)

Finally, I write the code to create the graph. Since I want the graph to be interactive, I use “p” to define the graph, so later I can call it easily in the function. For the graph, I use the variables defined above to make the graph with all of the visual details that I want to be included. In the last line of the code in this chunk, I use ggplotly to create the interactive shot map with the tooltip, also previosuly defined.

p <- ggplot(shots) +
  annotate_pitch(dimension = pitch_statsbomb, colour = "black", fill = "white") +
  geom_point(
    aes(x = x, y = y, size = xg, color = team, shape = outcome, text = tooltip, fill = team),
    stroke = shots$stroke_size,
    alpha = 0.6
  ) +
  scale_shape_manual(values = c(
    "Goal" = 21,
    "Saved" = 22,
    "Off T" = 4,
    "Blocked" = 25,
    "Post" = 8,
    "Wayward" = 9
  )) +
  scale_size(range = c(1, 5)) +
  coord_flip(xlim = c(0, 120), ylim = c(0, 80)) +
  theme_pitch() +
  facet_wrap(~team) +
  labs(
    title = "Liverpool 4 - 1 Stoke City",
    shape = "Shot Outcome"
    ) +
  guides(
  fill = "none",
  size = "none",
  color = "none"
)

ggplotly(p, tooltip = "text")

In this shot map that I’ve made using ggplot and ggplotly with data from StatsBomb, I have a lot more freedom to make the graph unique, unlike in the Understat, ggfootball graph. I like that in this xG map, I can have more creativity to display the details of the shot and include additional variables in the tooltip for each shot. I trust the StatsBomb data, and it is much easier to get detailed event data from matches using StatsBomb than it is from Understat, even though the matches in the FreeMatches function are limited. However, the ggfootball library makes the entire process so simple, with just one line of code. I appreciate that the library is able to scrape the data from Understat, which from my experience so far seems like another reliable data source.