‘ggplot2’ is a popular data visualization package in the R programming language. This package provides a flexible way to create a wide variety of data visualizations, such as scatter plots, bar charts, line charts, and more. The key concept behind ggplot2 is the idea of layering. You start with a base layer representing your data and then add additional layers to customize the plot. Each layer can include elements like geometric shapes (points, lines, bars), aesthetics (colors, sizes, shapes), and statistical transformations.

Load Packages

For our ggplot demonstrations there are a few packages that will need to be loaded. The fitzroy package gives access to AFL data from the 2023 season, it includes all player stats collected by the AFL for all games across the season.

#if you have never used these packages before youll have to install first
#install.packages("fitzRoy")
#install.packages("tidyverse")
#install.packages("ggplot2")

library(fitzRoy)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)

Load Data

The data we will use today is from the ‘fitzRoy’ package. We will filter the data to only collect the games from the 2023 season (it should be enough to demonstate).

afl <- fitzRoy::fetch_player_stats(season = 2023)
## 
ℹ Fetching match ids

✔ Fetching match ids ... done
## 
ℹ Finding player stats for 216 matches.

✔ Finding player stats for 216 matches. ... done

Clean Data

For easier demonstraion we will use the select function to only select the variables that are needed. Lets have a look at what the dataframe looks like now.

afl <- afl %>% 
  select(round.name, home.team.name, away.team.name, player.player.position, 
         teamId, firstname = player.player.player.givenName, 
         lastname = player.player.player.surname, 
         timeOnGroundPercentage, goals, behinds, kicks, handballs, disposals, marks, 
         disposalEfficiency, dreamTeamPoints)
head(afl)
## # A tibble: 6 × 16
##   round.name home.team.name away.team.name player.player.position teamId 
##   <chr>      <chr>          <chr>          <chr>                  <chr>  
## 1 Round 1    Richmond       Carlton        CHB                    CD_T120
## 2 Round 1    Richmond       Carlton        BPL                    CD_T120
## 3 Round 1    Richmond       Carlton        HFFR                   CD_T120
## 4 Round 1    Richmond       Carlton        HBFL                   CD_T120
## 5 Round 1    Richmond       Carlton        INT                    CD_T120
## 6 Round 1    Richmond       Carlton        BPR                    CD_T120
## # ℹ 11 more variables: firstname <chr>, lastname <chr>,
## #   timeOnGroundPercentage <dbl>, goals <dbl>, behinds <dbl>, kicks <dbl>,
## #   handballs <dbl>, disposals <dbl>, marks <dbl>, disposalEfficiency <dbl>,
## #   dreamTeamPoints <dbl>

Basic ggplots

First lets create a very basic ggplot with disposals and time on ground percentage.

ggplot(data =  afl, aes(x = disposals, y = timeOnGroundPercentage)) +
  geom_point()

But this graph is very basic and not very appealing to the eye, so we can use colour, size, facets and titles to make it look nicer.

Adding some colour

Colour can be used to add in a third feature to add a deeper understanding to the dataset. We can take the previous two factors, disposals and time on ground percentage, and add in the third feature of disposal effeciency to see how effective players disposals are at the different quantities and if time on ground percentage plays a role in that too.

ggplot(data =  afl, aes(x = disposals, y = timeOnGroundPercentage, col = disposalEfficiency)) +
  geom_point()

Size

Size can also be used to add a third feature. First we will subset the data to only have the round 1 stats for collingwood. and with this rather than colour representing the disposal efficiency we can make size represent.

Collingwood <- afl %>% 
  filter(teamId == "CD_T40")
CollingwoodRound1 <- afl %>% 
  filter(teamId == "CD_T40" & round.name == "Round 1")


ggplot(data = CollingwoodRound1, aes(x = disposals, y = timeOnGroundPercentage, 
                               col = lastname, size = disposalEfficiency)) +
  geom_point()

Facets

Facets is an awesome way to split data across seperate graphs, while presenting the same data. First we will further subset the data to contain some of the forward players for Collingwood and then we can visualise the goals kicked over the season, but facet wrap so each player is displayed on their own graph. Colour was also used as a third feature to highlight the disposal efficiency for that game.

CollingwoodForwards <- Collingwood %>% 
  filter(lastname == "Cox" | lastname == "McStay" | lastname == "Elliott" |
         lastname == "Hill" | lastname == "McCreery" | lastname == "Mihocek")

ggplot(data = CollingwoodForwards, aes(x = goals, y = kicks, col = disposalEfficiency)) +
  geom_point() +
  facet_wrap(~lastname, nrow = 2)